From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Alex Chao; Apurva Mody; Darren Edge; Dasha Metropolitansky; Ha Trinh; Jonathan Larson; Joshua Bradley; Newman Cheng; Robert Osazuwa Ness; Steven Truitt

arxiv: 2404.16130 · v2 · submitted 2024-04-24 · 💻 cs.CL · cs.AI· cs.IR

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge , Ha Trinh , Newman Cheng , Joshua Bradley , Alex Chao , Apurva Mody , Steven Truitt , Dasha Metropolitansky

show 2 more authors

Robert Osazuwa Ness Jonathan Larson

This is my paper

Pith reviewed 2026-05-11 05:07 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IR

keywords GraphRAGRetrieval-Augmented GenerationQuery-Focused SummarizationEntity Knowledge GraphCommunity SummariesGlobal Question AnsweringLarge Language Models

0 comments

The pith

GraphRAG builds entity knowledge graphs and community summaries to answer global questions over large private text collections more comprehensively than standard RAG.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GraphRAG to address the limitation that standard retrieval-augmented generation fails on broad, corpus-wide questions which are really query-focused summarization tasks. It uses an LLM in two stages to first extract an entity knowledge graph from the source documents and then generate summaries for communities of closely related entities. When a question arrives, the system produces a partial response from each community summary and then combines those into a single final answer. The authors test this on global sensemaking questions over datasets in the one-million-token range and report substantial gains in both comprehensiveness and diversity of answers compared with a conventional RAG baseline.

Core claim

GraphRAG constructs a graph index by deriving an entity knowledge graph from the source documents and then pregenerating community summaries for all groups of closely related entities; given a question, each community summary generates a partial response and all partial responses are summarized into a final answer, yielding substantial improvements over a conventional RAG baseline in both comprehensiveness and diversity for global sensemaking questions over datasets in the 1 million token range.

What carries the argument

Two-stage LLM-based graph indexing that first builds an entity knowledge graph and then pregenerates community summaries for groups of related entities, which are used to create and aggregate partial responses.

If this is right

GraphRAG can answer questions that require understanding an entire document collection rather than isolated passages.
The method scales query-focused summarization to the same quantities of text handled by typical RAG systems.
Partial responses from community summaries can be synthesized into final answers that improve both breadth and variety over direct retrieval.
The two-stage indexing allows the system to handle both narrow retrieval questions and broad sensemaking questions within one framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on domain-specific corpora such as legal contracts or scientific papers where global pattern detection is valuable.
If community detection quality varies, the method might benefit from iterative refinement of the graph index based on question type.
Hybrid systems could route local questions to standard RAG and global questions to GraphRAG without changing the underlying LLM.

Load-bearing premise

LLM-generated entity graphs and community summaries accurately and comprehensively capture the source material without introducing errors, omissions, or biases that undermine the final combined responses.

What would settle it

A human evaluation on a corpus with independently verified global themes in which GraphRAG answers show no measurable gain in comprehensiveness or diversity, or in which the community summaries omit or distort major themes present in the raw text.

read the original abstract

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose GraphRAG, a graph-based approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text. Our approach uses an LLM to build a graph index in two stages: first, to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that GraphRAG leads to substantial improvements over a conventional RAG baseline for both the comprehensiveness and diversity of generated answers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphRAG precomputes LLM entity graphs and community summaries to handle global questions over large corpora better than standard RAG, but the gains rest on unvalidated extraction quality.

read the letter

GraphRAG uses an LLM to extract entities into a knowledge graph from the source documents, then generates summaries for detected communities of related entities. At query time it runs partial answers off each community summary and folds those into a final response. This targets the clear weakness in ordinary RAG when users ask broad questions like main themes or overall patterns across a whole private collection. The two-stage indexing plus map-reduce over communities is the concrete new piece; it is not just another retrieval trick but a deliberate precomputation step to make global sensemaking feasible at scale. The abstract reports better comprehensiveness and diversity on 1-million-token datasets, which matches the practical need the authors describe. That part of the contribution is straightforward and addresses a gap that many RAG deployments actually hit. The evaluation claim is the soft spot. The abstract gives no numbers on metrics, no baseline code or dataset details, and no sign of human checks on whether the extracted entities or community summaries are accurate or complete. Without those, it is hard to know whether the reported lift comes from the graph structure itself or simply from running more LLM calls. The stress-test concern lands: any systematic omission or bias in the first two LLM stages would flow straight into the final answers. If the full paper has ablations against oracle graphs or inter-annotator scores on the index, that would change the picture; otherwise the central result stays provisional. This is for teams already running RAG on private data who need global queries to work without manual chunking. A practitioner reader can take the pipeline description and try it, even if they have to fill in the missing eval details themselves. It is worth a serious referee because the problem is real, the method is implementable, and the engineering framing is honest. Send it to review and ask for the full experimental section plus any validation of the graph quality.

Referee Report

2 major / 1 minor

Summary. The paper proposes GraphRAG, a two-stage LLM-driven indexing method that first extracts an entity knowledge graph from source documents and then generates community summaries over related entity groups. For global sensemaking queries, it produces partial answers from each community summary and applies a final map-reduce summarization step. The central empirical claim is that this yields substantial gains in answer comprehensiveness and diversity relative to a conventional RAG baseline on corpora of approximately 1 million tokens.

Significance. If the reported gains prove robust under detailed evaluation, the work would meaningfully advance RAG systems by addressing their documented weakness on global queries through graph-based indexing and hierarchical summarization. The approach is an empirical engineering contribution that combines existing ideas in a scalable way for private corpora.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): the central claim of 'substantial improvements' in comprehensiveness and diversity is stated without any quantitative results, exact metric definitions, dataset descriptions, baseline implementation details, or statistical significance tests. This information is load-bearing for assessing whether the gains arise from the graph structure rather than additional LLM calls.
[§3] §3 (Method): the two-stage indexing (entity KG construction followed by community summarization) is presented without any human validation, inter-annotator agreement scores, or ablation against oracle graphs. Because downstream partial responses and the final summary are also LLM-generated, systematic extraction errors or omissions would propagate directly into the reported gains, yet no such checks are described.

minor comments (1)

[§3.3] The description of how community summaries are combined in the final response step could be clarified with a short pseudocode or diagram to make the map-reduce flow explicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the central claim of 'substantial improvements' in comprehensiveness and diversity is stated without any quantitative results, exact metric definitions, dataset descriptions, baseline implementation details, or statistical significance tests. This information is load-bearing for assessing whether the gains arise from the graph structure rather than additional LLM calls.

Authors: We agree that the abstract and §4 would be strengthened by explicit quantitative details. In the revised manuscript we will update the abstract to reference key quantitative findings from the experiments and expand §4 to provide exact metric definitions (human Likert-scale ratings for comprehensiveness and diversity), dataset descriptions, baseline implementation specifics, and statistical significance results. We will also add analysis that isolates the contribution of the graph indexing from the total number of LLM calls. revision: yes
Referee: [§3] §3 (Method): the two-stage indexing (entity KG construction followed by community summarization) is presented without any human validation, inter-annotator agreement scores, or ablation against oracle graphs. Because downstream partial responses and the final summary are also LLM-generated, systematic extraction errors or omissions would propagate directly into the reported gains, yet no such checks are described.

Authors: We acknowledge the value of validating the intermediate indexing steps. We will revise §3 to discuss potential error propagation from LLM-based entity and community extraction and include any available internal checks or related evidence. A full-scale human validation or oracle-graph ablation is resource-intensive at the corpus scale, but we will add a limitation statement and, where feasible, a small-scale comparison to better contextualize the results. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical engineering contribution with independent evaluation

full rationale

The paper proposes GraphRAG as a two-stage LLM-based indexing method (entity KG construction followed by community summarization) for global query-focused summarization, then reports empirical gains in comprehensiveness and diversity over a standard RAG baseline on 1M-token datasets. No equations, first-principles derivations, fitted parameters, or predictions appear in the abstract or described method. The central claim is an empirical comparison rather than a reduction to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are referenced. The evaluation metrics and baseline are external to the indexing process itself, making the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

No free parameters or invented entities are introduced; the approach rests on standard assumptions about LLM extraction capabilities and graph community structure.

axioms (2)

domain assumption Large language models can extract entities and relations from source text to form a usable knowledge graph.
Invoked in the first stage of index construction.
domain assumption Communities of related entities identified via graph algorithms yield summaries that collectively support global question answering.
Invoked in the second stage and response generation.

pith-pipeline@v0.9.0 · 5579 in / 1270 out tokens · 34713 ms · 2026-05-11T05:07:17.015621+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations
cs.CL 2026-05 conditional novelty 8.0

GroupMemBench shows leading LLM memory systems reach only 46% average accuracy on multi-party tasks, with a simple BM25 baseline matching or beating most of them.
MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare
cs.AI 2026-05 conditional novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for ...
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
cs.CR 2026-05 unverdicted novelty 8.0

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
cs.CR 2026-05 unverdicted novelty 8.0

ShadowMerge poisons graph-based agent memory by creating relation-channel conflicts that get extracted and retrieved, achieving 93.8% attack success rate on Mem0 and datasets like PubMedQA while evading prior defenses.
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts
cs.CR 2026-05 unverdicted novelty 8.0

ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
cs.CR 2026-05 unverdicted novelty 8.0

Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...
MemGym: a Long-Horizon Memory Environment for LLM Agents
cs.CL 2026-05 unverdicted novelty 7.0

MemGym unifies agent gyms into a memory benchmark with isolated scoring across tool-use, research, coding, and computer-use regimes plus a lightweight reward model for tractable coding evaluation.
Argus: Evidence Assembly for Scalable Deep Research Agents
cs.CL 2026-05 unverdicted novelty 7.0

Argus coordinates a Navigator and multiple Searchers via an evidence graph to assemble complete, source-traced answers, yielding benchmark gains up to 12.7 points with 8 parallel agents and 86.2 on BrowseComp with 64 agents.
MeMo: Memory as a Model
cs.CL 2026-05 unverdicted novelty 7.0

MeMo encodes new knowledge into a separate memory model for frozen LLMs, achieving strong performance on BrowseComp-Plus, NarrativeQA, and MuSiQue while capturing cross-document relationships and remaining robust to r...
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations
cs.CL 2026-05 unverdicted novelty 7.0

GroupMemBench is a new benchmark exposing that LLM agent memory systems fail on group conversation properties like speaker-grounded tracking and audience-adapted responses, with top systems at 46% accuracy.
Thinking Ahead: Prospection-Guided Retrieval of Memory with Language Models
cs.IR 2026-05 conditional novelty 7.0

PGR expands user queries into plausible future steps via Tree-of-Thought or chains and uses them as retrieval probes, delivering nearly 3x recall gains on the new MemoryQuest benchmark for low-similarity memory retrieval.
Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation
cs.AI 2026-05 unverdicted novelty 7.0

PyRAG turns multi-hop reasoning into executable Python code over retrieval tools for explicit, verifiable step-by-step RAG.
MEME: Multi-entity & Evolving Memory Evaluation
cs.LG 2026-05 unverdicted novelty 7.0

All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems
cs.AI 2026-05 unverdicted novelty 7.0

Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
cs.LG 2026-05 unverdicted novelty 7.0

MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
cs.CL 2026-05 unverdicted novelty 7.0

DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs
cs.AI 2026-05 unverdicted novelty 7.0

MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.
SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards
eess.SP 2026-05 unverdicted novelty 7.0

SEM-RAG compiles telecommunication standards into structure-preserving graphs and uses entropy-guided retrieval to reach 94.1% accuracy on TeleQnA and 93.8% on ORAN-Bench-13K while reducing indexing token usage compar...
When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory
cs.AI 2026-05 unverdicted novelty 7.0

A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.
The Context Gathering Decision Process: A POMDP Framework for Agentic Search
cs.AI 2026-05 accept novelty 7.0

Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no perf...
MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents
cs.CL 2026-05 unverdicted novelty 7.0

MANTRA automatically synthesizes SMT-validated compliance benchmarks for LLM agents from natural language manuals and tool schemas, producing 285 tasks across 6 domains with minimal human effort.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States
cs.CL 2026-05 unverdicted novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents
cs.MA 2026-05 unverdicted novelty 7.0

MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
cs.CL 2026-05 unverdicted novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation
cs.AI 2026-04 unverdicted novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
Skill Retrieval Augmentation for Agentic AI
cs.CL 2026-04 unverdicted novelty 7.0

Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.
A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding
cs.AI 2026-04 unverdicted novelty 7.0

A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.
Structure Guided Retrieval-Augmented Generation for Factual Queries
cs.IR 2026-04 unverdicted novelty 7.0

SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.
ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation
cs.CL 2026-04 unverdicted novelty 7.0

ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.
STRIDE: Strategic Iterative Decision-Making for Retrieval-Augmented Multi-Hop Question Answering
cs.AI 2026-04 unverdicted novelty 7.0

STRIDE uses a meta-planner for entity-agnostic reasoning skeletons and a supervisor for dependency-aware execution to improve retrieval-augmented multi-hop QA.
SAGER: Self-Evolving User Policy Skills for Recommendation Agent
cs.IR 2026-04 unverdicted novelty 7.0

SAGER equips LLM recommendation agents with per-user evolving policy skills via two-representation architecture, contrastive CoT diagnosis, and skill-augmented listwise reasoning, yielding SOTA gains orthogonal to mem...
ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback
cs.AI 2026-04 unverdicted novelty 7.0

ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.
DOTRAG: Retrieval-Time Reasoning Along Paths
cs.IR 2026-04 unverdicted novelty 7.0

DotRAG reformulates graph retrieval as query-guided path reasoning with Division of Thought, reporting SOTA results on MetaQA and UltraDomain for multi-hop tasks.
MisEdu-RAG: A Misconception-Aware Dual-Hypergraph RAG for Novice Math Teachers
cs.IR 2026-04 unverdicted novelty 7.0

MisEdu-RAG builds concept and instance hypergraphs for two-stage retrieval of pedagogical knowledge and student errors, improving feedback quality on the MisstepMath benchmark by 10.95% token-F1 and up to 15.3% on res...
AnnoRetrieve: Efficient Structured Retrieval for Unstructured Document Analysis
cs.IR 2026-04 unverdicted novelty 7.0

AnnoRetrieve uses auto-generated structured schemas and queries to retrieve information from unstructured documents more efficiently and accurately than embedding-based methods.
Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems
cs.IR 2026-04 unverdicted novelty 7.0

Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.
Semantic Level of Detail for Knowledge Graphs: Discovering Abstraction Boundaries via Spectral Heat Diffusion
cs.LG 2026-03 unverdicted novelty 7.0

SLoD detects emergent scale boundaries in knowledge graphs by applying spectral heat diffusion to Poincare embeddings, recovering planted hierarchies in synthetic data and aligning with taxonomic depths in WordNet wit...
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
cs.AI 2026-03 unverdicted novelty 7.0

GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference t...
AtomicRAG: Atom-Entity Graphs for Retrieval-Augmented Generation
cs.IR 2026-02 unverdicted novelty 7.0

AtomicRAG replaces chunk-based and triple-based GraphRAG with atom-entity graphs that store facts as atomic units and use personalized PageRank plus relevance filtering to achieve higher retrieval accuracy and reasoni...
KRONE: Scalable LLM-Augmented Log Anomaly Detection via Hierarchical Abstraction
cs.DB 2026-02 conditional novelty 7.0

KRONE derives semantic execution hierarchies from flat logs to enable modular multi-level anomaly detection with hybrid local and nested-aware detectors plus limited LLM use, delivering 10% F1 gains and over 100x data...
Autonomous Knowledge Graph Exploration with Adaptive Breadth-Depth Retrieval
cs.AI 2026-01 unverdicted novelty 7.0

ARK adaptively retrieves from knowledge graphs using global lexical search and one-hop neighborhood exploration, reaching 59.1% Hit@1 on STaRK with up to 31.4% gains over training-free baselines and enabling distillat...
M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation
cs.CL 2025-12 unverdicted novelty 7.0

M³KG-RAG improves multimodal reasoning in large language models by constructing multi-hop knowledge graphs and selectively pruning retrieved context with GRASP.
VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models
cs.CL 2025-12 conditional novelty 7.0

VLegal-Bench supplies 10,450 expert-validated samples for evaluating LLMs on Vietnamese legal questions, retrieval, multi-step reasoning, and scenario solving.
Deterministic Legal Agents: A Canonical Primitive API for Auditable Reasoning over Temporal Knowledge Graphs
cs.AI 2025-10 unverdicted novelty 7.0

The paper specifies the SAT-Graph API, a canonical primitive interface that enables auditable, deterministic reasoning over temporal knowledge graphs by isolating uncertainty to intent translation and narrative synthesis.
mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA
cs.CV 2025-08 unverdicted novelty 7.0

mKG-RAG constructs multimodal KGs via MLLM-driven extraction and vision-text matching then applies dual-stage query-aware retrieval to achieve new state-of-the-art results on knowledge-based VQA.
OKG-LLM: Aligning Ocean Knowledge Graph with Observation Data via LLMs for Global Sea Surface Temperature Prediction
cs.LG 2025-07 unverdicted novelty 7.0

OKG-LLM constructs an Ocean Knowledge Graph, learns its embeddings, fuses them with SST observations, and applies an LLM to outperform prior methods on global sea surface temperature prediction.
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
cs.MA 2025-06 accept novelty 7.0

A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis
cs.DL 2025-05 unverdicted novelty 7.0

A framework for nuanced, time-aware research impact summarization using fine-grained temporal citation intents shows moderate to strong correlation with human judgments on insightfulness.
An Ontology-Driven Graph RAG for Legal Norms: A Structural, Temporal, and Deterministic Approach
cs.CL 2025-04 unverdicted novelty 7.0

SAT-Graph RAG is a new ontology-driven temporal graph framework for legal RAG that models Works vs. Expressions, reuses versioned components for temporal states, and treats legislative events as queryable Action nodes...
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese
cs.CL 2025-04 conditional novelty 7.0

BrowseComp-ZH is a new benchmark of 289 Chinese web questions where even the strongest LLM agents reach only 42.9% accuracy.
DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA
cs.CL 2026-05 unverdicted novelty 6.0

DeferMem decouples memory QA into high-recall retrieval and RL-based query-conditioned evidence distillation, outperforming baselines on LoCoMo and LongMemEval-S with highest accuracy, fastest runtime, and zero API to...
Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs
cs.LG 2026-05 unverdicted novelty 6.0

Ex-GraphRAG replaces GNN encoders with M-GNAN for exact node-level decomposition in graph-augmented LLMs, matching black-box performance on STaRK-Prime while exposing semantic-structural mismatches that degrade multi-...
Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables
cs.AI 2026-05 unverdicted novelty 6.0

Empirical 2x2 factorial study on 6 statistical datasets shows format and schema constraints in LLM-based KG construction from CSV tables produce super-additive fidelity loss up to +1.180, with mismatched pairs falling...
SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents
cs.CV 2026-05 unverdicted novelty 6.0

SPIKE dual-controller framework raises success rates 5-9 points and cuts tokens 55% in StarDojo agents by reusing strategic plans across stable segments and escalating only at detected events.
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective
cs.CL 2026-05 unverdicted novelty 6.0

EvoMemBench evaluates 15 memory methods for LLM agents and finds long-context baselines competitive with no single memory approach working consistently across settings.
Argus: Evidence Assembly for Scalable Deep Research Agents
cs.CL 2026-05 unverdicted novelty 6.0

Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmar...
H-Mem: A Novel Memory Mechanism for Evolving and Retrieving Agent Memory via a Hybrid Structure
cs.CL 2026-05 unverdicted novelty 6.0

H-Mem introduces a hybrid tree-plus-graph memory mechanism that evolves short-term agent memories into long-term summaries and enables efficient retrieval, reporting state-of-the-art QA results on three benchmarks.
Why Retrieval-Augmented Generation Fails: A Graph Perspective
cs.CL 2026-05 unverdicted novelty 6.0

Attribution graphs reveal that RAG failures arise from shallow fragmented evidence flow in LLMs, enabling topology-based detection and targeted interventions that reinforce question-guided routing.
Cognifold: Always-On Proactive Memory via Cognitive Folding
cs.AI 2026-05 unverdicted novelty 6.0

Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-or...
IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation
cs.AI 2026-05 unverdicted novelty 6.0

IdeaForge combines multiple innovation methodologies through specialist agents on a persistent knowledge graph, using cross-methodology convergent claim linkages to rank and draft patent claims with higher traceabilit...

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · cited by 172 Pith papers · 12 internal anchors

[1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Gemini: A Family of Highly Capable Multimodal Models

Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., et al. (2023). Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

Knowledge-augmented language model prompt- ing for zero-shot knowledge graph question answering

Baek, J., Aji, A. F., and Saffari, A. (2023). Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136

work page arXiv 2023
[4]

Ban, T., Chen, L., Wang, X., and Chen, H. (2023). From query tools to causal architects: Harnessing large language models for advanced causal discovery from data

work page 2023
[5]

and Gulla, J

Barlaug, N. and Gulla, J. A. (2021). Neural networks for entity matching: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD) , 15(3):1--37

work page 2021
[6]

Baumel, T., Eyal, M., and Elhadad, M. (2018). Query focused abstractive summarization: Incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models. arXiv preprint arXiv:1801.07704

work page arXiv 2018
[7]

D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment , 2008(10):P10008

work page 2008
[8]

D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems , 33:1877--1901

work page 2020
[9]

Cheng, X., Luo, D., Chen, X., Liu, L., Zhao, D., and Yan, R. (2024). Lift yourself up: Retrieval-augmented text generation with self-memory. Advances in Neural Information Processing Systems , 36

work page 2024
[10]

and Christen, P

Christen, P. and Christen, P. (2012). The data matching process . Springer

work page 2012
[11]

D., Bridgeford, E

Chung, J., Pedigo, B. D., Bridgeford, E. W., Varjavand, B. K., Helm, H. S., and Vogelstein, J. T. (2019). Graspy: Graph statistics in python. Journal of Machine Learning Research , 20(158):1--7

work page 2019
[12]

Dang, H. T. (2006). Duc 2005: Evaluation of question-focused summarization systems. In Proceedings of the Workshop on Task-Focused Summarization and Question Answering , pages 48--55

work page 2006
[13]

K., Ipeirotis, P

Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. (2006). Duplicate record detection: A survey. IEEE Transactions on knowledge and data engineering , 19(1):1--16

work page 2006
[14]

Es, S., James, J., Espinosa-Anke, L., and Schockaert, S. (2023). Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217

work page internal anchor Pith review arXiv 2023
[15]

S., and Yates, A

Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. (2004). Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web , WWW '04, page 100–110, New York, NY, USA. Association for Computing Machinery

work page 2004
[16]

Feng, Z., Feng, X., Zhao, D., Yang, M., and Qin, B. (2023). Retrieval-generation synergy augmented large language models. arXiv preprint arXiv:2310.05149

work page arXiv 2023
[17]

Fortunato, S. (2010). Community detection in graphs. Physics reports , 486(3-5):75--174

work page 2010
[18]

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,

He, X., Tian, Y., Sun, Y., Chawla, N. V., Laurent, T., LeCun, Y., Bresson, X., and Hooi, B. (2024). G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. arXiv preprint arXiv:2402.07630

work page arXiv 2024
[20]

Large Language Models Cannot Self-Correct Reasoning Yet

Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., and Zhou, D. (2023). Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798

work page internal anchor Pith review arXiv 2023
[21]

Jacomy, M., Venturini, T., Heymann, S., and Bastian, M. (2014). Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS ONE 9(6): e98679. https://doi.org/10.1371/journal.pone.0098679

work page doi:10.1371/journal.pone.0098679 2014
[22]

Y., and Zhang, W

Jin, D., Yu, Z., Jiao, P., Pan, S., He, D., Wu, J., Philip, S. Y., and Zhang, W. (2021). A survey of community detection approaches: From statistical modeling to deep learning. IEEE Transactions on Knowledge and Data Engineering , 35(2):1149--1170

work page 2021
[23]

Knowledge graph-augmented language models for knowledge-grounded dialogue generation,

Kang, M., Kwak, J. M., Baek, J., and Hwang, S. J. (2023). Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv preprint arXiv:2305.18846

work page arXiv 2023
[24]

Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp,

Khattab, O., Santhanam, K., Li, X. L., Hall, D., Liang, P., Potts, C., and Zaharia, M. (2022). Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv:2212.14024

work page arXiv 2022
[25]

Kim, D., Xie, L., and Ong, C. S. (2016). Probabilistic knowledge graph construction: Compositional and incremental approaches. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management , CIKM '16, page 2257–2262, New York, NY, USA. Association for Computing Machinery

work page 2016
[26]

Kim, G., Kim, S., Jeon, B., Park, J., and Kang, J. (2023). Tree of clarifications: Answering ambiguous questions with retrieval-augmented large language models. arXiv preprint arXiv:2310.14696

work page arXiv 2023
[27]

Klein, G., Moon, B., and Hoffman, R. R. (2006). Making sense of sensemaking 1: Alternative perspectives. IEEE intelligent systems , 21(4):70--73

work page 2006
[28]

Kosinski, M. (2024). Evaluating large language models in theory of mind tasks. Proceedings of the National Academy of Sciences , 121(45):e2405460121

work page 2024
[29]

Kuratov, Y., Bulatov, A., Anokhin, P., Sorokin, D., Sorokin, A., and Burtsev, M. (2024). In search of needles in a 11m haystack: Recurrent memory finds what llms miss

work page 2024
[30]

Langchain graphs

LangChain (2024). Langchain graphs. https://langchain-graphrag.readthedocs.io/en/latest/

work page 2024
[31]

Laskar, M. T. R., Hoque, E., and Huang, J. (2020). Query focused abstractive summarization via incorporating query relevance and transfer learning with transformer models. In Advances in Artificial Intelligence: 33rd Canadian Conference on Artificial Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13--15, 2020, Proceedings 33 , pages 342--348. Springer

work page 2020
[32]

u ttler, H., Lewis, M., Yih, W.-t., Rockt \

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K \"u ttler, H., Lewis, M., Yih, W.-t., Rockt \"a schel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems , 33:9459--9474

work page 2020
[33]

Lost in the Middle: How Language Models Use Long Contexts

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., and Liang, P. (2023). Lost in the middle: How language models use long contexts. arXiv:2307.03172

work page internal anchor Pith review Pith/arXiv arXiv 2023
[34]

GraphRAG Implementation with LlamaIndex - V2

LlamaIndex (2024). GraphRAG Implementation with LlamaIndex - V2 . https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/GraphRAG_v2.ipynb

work page 2024
[35]

Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., et al. (2024). Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems , 36

work page 2024
[36]

Manakul, P., Liusie, A., and Gales, M. J. (2023). Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896

work page internal anchor Pith review arXiv 2023
[37]

Mao, Y., He, P., Liu, X., Shen, Y., Gao, J., Han, J., and Chen, W. (2020). Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553

work page arXiv 2020
[38]

M., Klavans, R., and Boyack, K

Martin, S., Brown, W. M., Klavans, R., and Boyack, K. (2011). Openord: An open-source toolbox for large graph layout. SPIE Conference on Visualization and Data Analysis (VDA)

work page 2011
[39]

Melnyk, I., Dognin, P., and Das, P. (2022). Knowledge graph generation from text

work page 2022
[40]

and Larson, J

Metropolitansky, D. and Larson, J. (2025). Towards effective extraction and evaluation of factual claims

work page 2025
[41]

The impact of large language models on scientific discovery: a preliminary study using gpt-4

Microsoft (2023). The impact of large language models on scientific discovery: a preliminary study using gpt-4

work page 2023
[42]

Mooney, R. J. and Bunescu, R. (2005). Mining knowledge from text using information extraction. SIGKDD Explor. Newsl. , 7(1):3–10

work page 2005
[43]

Nebulagraph launches industry-first graph rag: Retrieval-augmented generation with llm based on knowledge graphs

NebulaGraph (2024). Nebulagraph launches industry-first graph rag: Retrieval-augmented generation with llm based on knowledge graphs. https://www.nebula-graph.io/posts/graph-RAG

work page 2024
[44]

Get started with graphrag: Neo4j’s ecosystem tools

Neo4J (2024). Get started with graphrag: Neo4j’s ecosystem tools. https://neo4j.com/developer-blog/graphrag-ecosystem-tools/

work page 2024
[45]

Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences , 103(23):8577--8582

work page 2006
[46]

Ni, J., Shi, M., Stammbach, D., Sachan, M., Ash, E., and Leippold, M. (2024). AF a CTA : Assisting the annotation of factual claim detection with reliable LLM annotators. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 1890--1912, ...

work page 2024
[47]

Chatgpt: Gpt-4 language model

OpenAI (2023). Chatgpt: Gpt-4 language model

work page 2023
[48]

and He, H

Padmakumar, V. and He, H. (2024). Does writing with language models reduce content diversity? ICLR

work page 2024
[49]

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research , 12:2825--2830

work page 2011
[50]

Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K., and Shoham, Y. (2023). In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics , 11:1316--1331

work page 2023
[51]

Fabula: Intelligence report generation using retrieval-augmented narrative construction,

Ranade, P. and Joshi, A. (2023). Fabula: Intelligence report generation using retrieval-augmented narrative construction. arXiv preprint arXiv:2310.13848

work page arXiv 2023
[52]

Salminen, J., Liu, C., Pian, W., Chi, J., H \"a yh \"a nen, E., and Jansen, B. J. (2024). Deus ex machina and personas from large language models: Investigating the composition of ai-generated persona descriptions. In Proceedings of the CHI Conference on Human Factors in Computing Systems , pages 1--20

work page 2024
[53]

Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., and Manning, C. D. (2024). Raptor: Recursive abstractive processing for tree-organized retrieval. arXiv preprint arXiv:2401.18059

work page internal anchor Pith review arXiv 2024
[54]

Scott, K. (2024). Behind the Tech . https://www.microsoft.com/en-us/behind-the-tech

work page 2024
[55]

Shao, Z., Gong, Y., Shen, Y., Huang, M., Duan, N., and Chen, W. (2023). Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv preprint arXiv:2305.15294

work page arXiv 2023
[56]

A., Rey, B

Shin, J., Hedderich, M. A., Rey, B. J., Lucero, A., and Oulasvirta, A. (2024). Understanding human-ai workflows for generating personas. In Proceedings of the 2024 ACM Designing Interactive Systems Conference , pages 757--781

work page 2024
[57]

Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and Yao, S. (2024). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems , 36

work page 2024
[58]

B., Barezi, E

Su, D., Xu, Y., Yu, T., Siddique, F. B., Barezi, E. J., and Fung, P. (2020). Caire-covid: A question answering and query-focused multi-document summarization system for covid-19 scholarly information management. arXiv preprint arXiv:2005.03975

work page arXiv 2020
[59]

Tan, Z., Zhao, X., and Wang, W. (2017). Representation learning of large-scale knowledge graphs via entity feature combinations. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management , CIKM '17, page 1777–1786, New York, NY, USA. Association for Computing Machinery

work page 2017
[60]

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

Tang, Y. and Yang, Y. (2024). MultiHop-RAG : Benchmarking retrieval-augmented generation for multi-hop queries. arXiv preprint arXiv:2401.15391

work page internal anchor Pith review arXiv 2024
[61]

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

work page internal anchor Pith review Pith/arXiv arXiv 2023
[62]

A., Waltman, L., and Van Eck, N

Traag, V. A., Waltman, L., and Van Eck, N. J. (2019). From L ouvain to L eiden: guaranteeing well-connected communities. Scientific Reports , 9(1)

work page 2019
[63]

Trajanoska, M., Stojanov, R., and Trajanov, D. (2023). Enhancing knowledge graph construction using large language models. ArXiv , abs/2305.04676

work page arXiv 2023
[64]

Trivedi, H., Balasubramanian, N., Khot, T., and Sabharwal, A. (2022). Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509

work page internal anchor Pith review arXiv 2022
[65]

Wang, J., Liang, Y., Meng, F., Sun, Z., Shi, H., Li, Z., Xu, J., Qu, J., and Zhou, J. (2023a). Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048

work page arXiv
[66]

Wang, S., Khramtsova, E., Zhuang, S., and Zuccon, G. (2024). Feb4rag: Evaluating federated search in the context of retrieval augmented generation. arXiv preprint arXiv:2402.11891

work page arXiv 2024
[67]

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., and Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171

work page internal anchor Pith review Pith/arXiv arXiv 2022
[68]

A., Siu, A., Zhang, R., and Derr, T

Wang, Y., Lipka, N., Rossi, R. A., Siu, A., Zhang, R., and Derr, T. (2023b). Knowledge graph prompting for multi-document question answering

work page
[69]

and Lapata, M

Xu, Y. and Lapata, M. (2021). Text summarization with latent queries. arXiv preprint arXiv:2106.00104

work page arXiv 2021
[70]

W., Salakhutdinov, R., and Manning, C

Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., and Manning, C. D. (2018). HotpotQA : A dataset for diverse, explainable multi-hop question answering. In Conference on Empirical Methods in Natural Language Processing ( EMNLP )

work page 2018
[71]

Yao, J.-g., Wan, X., and Xiao, J. (2017). Recent advances in document summarization. Knowledge and Information Systems , 53:297--336

work page 2017
[72]

Yao, L., Peng, J., Mao, C., and Luo, Y. (2023). Exploring large language models for knowledge graph completion

work page 2023
[73]

Yates, A., Banko, M., Broadhead, M., Cafarella, M., Etzioni, O., and Soderland, S. (2007). T ext R unner: Open information extraction on the web. In Carpenter, B., Stent, A., and Williams, J. D., editors, Proceedings of Human Language Technologies: The Annual Conference of the North A merican Chapter of the Association for Computational Linguistics ( NAAC...

work page 2007
[74]

Yuan, X., Li, J., Wang, D., Chen, Y., Mao, X., Huang, L., Xue, H., Wang, W., Ren, K., and Wang, J. (2024). S-eval: Automatic and adaptive test generation for benchmarking safety evaluation of large language models. arXiv preprint arXiv:2405.14191

work page arXiv 2024
[75]

Zhang, J. (2023). Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. arXiv preprint arXiv:2304.11116

work page arXiv 2023
[76]

Zhang, Y., Zhang, Y., Gan, Y., Yao, L., and Wang, C. (2024a). Causal graph discovery with retrieval-augmented generation based large language models. arXiv preprint arXiv:2402.15301

work page arXiv
[77]

Zhang, Z., Chen, J., and Yang, D. (2024b). Darg: Dynamic evaluation of large language models via adaptive reasoning graph. arXiv preprint arXiv:2406.17271

work page arXiv
[78]

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. (2024). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems , 36

work page 2024
[79]

Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., and Zhang, N. (2024). Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities

work page 2024

[1] [1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Gemini: A Family of Highly Capable Multimodal Models

Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., et al. (2023). Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805

work page internal anchor Pith review Pith/arXiv arXiv 2023

[3] [3]

Knowledge-augmented language model prompt- ing for zero-shot knowledge graph question answering

Baek, J., Aji, A. F., and Saffari, A. (2023). Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136

work page arXiv 2023

[4] [4]

Ban, T., Chen, L., Wang, X., and Chen, H. (2023). From query tools to causal architects: Harnessing large language models for advanced causal discovery from data

work page 2023

[5] [5]

and Gulla, J

Barlaug, N. and Gulla, J. A. (2021). Neural networks for entity matching: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD) , 15(3):1--37

work page 2021

[6] [6]

Baumel, T., Eyal, M., and Elhadad, M. (2018). Query focused abstractive summarization: Incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models. arXiv preprint arXiv:1801.07704

work page arXiv 2018

[7] [7]

D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E

Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment , 2008(10):P10008

work page 2008

[8] [8]

D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems , 33:1877--1901

work page 2020

[9] [9]

Cheng, X., Luo, D., Chen, X., Liu, L., Zhao, D., and Yan, R. (2024). Lift yourself up: Retrieval-augmented text generation with self-memory. Advances in Neural Information Processing Systems , 36

work page 2024

[10] [10]

and Christen, P

Christen, P. and Christen, P. (2012). The data matching process . Springer

work page 2012

[11] [11]

D., Bridgeford, E

Chung, J., Pedigo, B. D., Bridgeford, E. W., Varjavand, B. K., Helm, H. S., and Vogelstein, J. T. (2019). Graspy: Graph statistics in python. Journal of Machine Learning Research , 20(158):1--7

work page 2019

[12] [12]

Dang, H. T. (2006). Duc 2005: Evaluation of question-focused summarization systems. In Proceedings of the Workshop on Task-Focused Summarization and Question Answering , pages 48--55

work page 2006

[13] [13]

K., Ipeirotis, P

Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. (2006). Duplicate record detection: A survey. IEEE Transactions on knowledge and data engineering , 19(1):1--16

work page 2006

[14] [14]

Es, S., James, J., Espinosa-Anke, L., and Schockaert, S. (2023). Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217

work page internal anchor Pith review arXiv 2023

[15] [15]

S., and Yates, A

Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. (2004). Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web , WWW '04, page 100–110, New York, NY, USA. Association for Computing Machinery

work page 2004

[16] [16]

Feng, Z., Feng, X., Zhao, D., Yang, M., and Qin, B. (2023). Retrieval-generation synergy augmented large language models. arXiv preprint arXiv:2310.05149

work page arXiv 2023

[17] [17]

Fortunato, S. (2010). Community detection in graphs. Physics reports , 486(3-5):75--174

work page 2010

[18] [18]

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

G-retriever: Retrieval-augmented generation for textual graph understanding and question answering,

He, X., Tian, Y., Sun, Y., Chawla, N. V., Laurent, T., LeCun, Y., Bresson, X., and Hooi, B. (2024). G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. arXiv preprint arXiv:2402.07630

work page arXiv 2024

[20] [20]

Large Language Models Cannot Self-Correct Reasoning Yet

Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., and Zhou, D. (2023). Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798

work page internal anchor Pith review arXiv 2023

[21] [21]

Jacomy, M., Venturini, T., Heymann, S., and Bastian, M. (2014). Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS ONE 9(6): e98679. https://doi.org/10.1371/journal.pone.0098679

work page doi:10.1371/journal.pone.0098679 2014

[22] [22]

Y., and Zhang, W

Jin, D., Yu, Z., Jiao, P., Pan, S., He, D., Wu, J., Philip, S. Y., and Zhang, W. (2021). A survey of community detection approaches: From statistical modeling to deep learning. IEEE Transactions on Knowledge and Data Engineering , 35(2):1149--1170

work page 2021

[23] [23]

Knowledge graph-augmented language models for knowledge-grounded dialogue generation,

Kang, M., Kwak, J. M., Baek, J., and Hwang, S. J. (2023). Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv preprint arXiv:2305.18846

work page arXiv 2023

[24] [24]

Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp,

Khattab, O., Santhanam, K., Li, X. L., Hall, D., Liang, P., Potts, C., and Zaharia, M. (2022). Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv:2212.14024

work page arXiv 2022

[25] [25]

Kim, D., Xie, L., and Ong, C. S. (2016). Probabilistic knowledge graph construction: Compositional and incremental approaches. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management , CIKM '16, page 2257–2262, New York, NY, USA. Association for Computing Machinery

work page 2016

[26] [26]

Kim, G., Kim, S., Jeon, B., Park, J., and Kang, J. (2023). Tree of clarifications: Answering ambiguous questions with retrieval-augmented large language models. arXiv preprint arXiv:2310.14696

work page arXiv 2023

[27] [27]

Klein, G., Moon, B., and Hoffman, R. R. (2006). Making sense of sensemaking 1: Alternative perspectives. IEEE intelligent systems , 21(4):70--73

work page 2006

[28] [28]

Kosinski, M. (2024). Evaluating large language models in theory of mind tasks. Proceedings of the National Academy of Sciences , 121(45):e2405460121

work page 2024

[29] [29]

Kuratov, Y., Bulatov, A., Anokhin, P., Sorokin, D., Sorokin, A., and Burtsev, M. (2024). In search of needles in a 11m haystack: Recurrent memory finds what llms miss

work page 2024

[30] [30]

Langchain graphs

LangChain (2024). Langchain graphs. https://langchain-graphrag.readthedocs.io/en/latest/

work page 2024

[31] [31]

Laskar, M. T. R., Hoque, E., and Huang, J. (2020). Query focused abstractive summarization via incorporating query relevance and transfer learning with transformer models. In Advances in Artificial Intelligence: 33rd Canadian Conference on Artificial Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13--15, 2020, Proceedings 33 , pages 342--348. Springer

work page 2020

[32] [32]

u ttler, H., Lewis, M., Yih, W.-t., Rockt \

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K \"u ttler, H., Lewis, M., Yih, W.-t., Rockt \"a schel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems , 33:9459--9474

work page 2020

[33] [33]

Lost in the Middle: How Language Models Use Long Contexts

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., and Liang, P. (2023). Lost in the middle: How language models use long contexts. arXiv:2307.03172

work page internal anchor Pith review Pith/arXiv arXiv 2023

[34] [34]

GraphRAG Implementation with LlamaIndex - V2

LlamaIndex (2024). GraphRAG Implementation with LlamaIndex - V2 . https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/GraphRAG_v2.ipynb

work page 2024

[35] [35]

Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., et al. (2024). Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems , 36

work page 2024

[36] [36]

Manakul, P., Liusie, A., and Gales, M. J. (2023). Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896

work page internal anchor Pith review arXiv 2023

[37] [37]

Mao, Y., He, P., Liu, X., Shen, Y., Gao, J., Han, J., and Chen, W. (2020). Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553

work page arXiv 2020

[38] [38]

M., Klavans, R., and Boyack, K

Martin, S., Brown, W. M., Klavans, R., and Boyack, K. (2011). Openord: An open-source toolbox for large graph layout. SPIE Conference on Visualization and Data Analysis (VDA)

work page 2011

[39] [39]

Melnyk, I., Dognin, P., and Das, P. (2022). Knowledge graph generation from text

work page 2022

[40] [40]

and Larson, J

Metropolitansky, D. and Larson, J. (2025). Towards effective extraction and evaluation of factual claims

work page 2025

[41] [41]

The impact of large language models on scientific discovery: a preliminary study using gpt-4

Microsoft (2023). The impact of large language models on scientific discovery: a preliminary study using gpt-4

work page 2023

[42] [42]

Mooney, R. J. and Bunescu, R. (2005). Mining knowledge from text using information extraction. SIGKDD Explor. Newsl. , 7(1):3–10

work page 2005

[43] [43]

Nebulagraph launches industry-first graph rag: Retrieval-augmented generation with llm based on knowledge graphs

NebulaGraph (2024). Nebulagraph launches industry-first graph rag: Retrieval-augmented generation with llm based on knowledge graphs. https://www.nebula-graph.io/posts/graph-RAG

work page 2024

[44] [44]

Get started with graphrag: Neo4j’s ecosystem tools

Neo4J (2024). Get started with graphrag: Neo4j’s ecosystem tools. https://neo4j.com/developer-blog/graphrag-ecosystem-tools/

work page 2024

[45] [45]

Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences , 103(23):8577--8582

work page 2006

[46] [46]

Ni, J., Shi, M., Stammbach, D., Sachan, M., Ash, E., and Leippold, M. (2024). AF a CTA : Assisting the annotation of factual claim detection with reliable LLM annotators. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 1890--1912, ...

work page 2024

[47] [47]

Chatgpt: Gpt-4 language model

OpenAI (2023). Chatgpt: Gpt-4 language model

work page 2023

[48] [48]

and He, H

Padmakumar, V. and He, H. (2024). Does writing with language models reduce content diversity? ICLR

work page 2024

[49] [49]

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research , 12:2825--2830

work page 2011

[50] [50]

Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K., and Shoham, Y. (2023). In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics , 11:1316--1331

work page 2023

[51] [51]

Fabula: Intelligence report generation using retrieval-augmented narrative construction,

Ranade, P. and Joshi, A. (2023). Fabula: Intelligence report generation using retrieval-augmented narrative construction. arXiv preprint arXiv:2310.13848

work page arXiv 2023

[52] [52]

Salminen, J., Liu, C., Pian, W., Chi, J., H \"a yh \"a nen, E., and Jansen, B. J. (2024). Deus ex machina and personas from large language models: Investigating the composition of ai-generated persona descriptions. In Proceedings of the CHI Conference on Human Factors in Computing Systems , pages 1--20

work page 2024

[53] [53]

Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., and Manning, C. D. (2024). Raptor: Recursive abstractive processing for tree-organized retrieval. arXiv preprint arXiv:2401.18059

work page internal anchor Pith review arXiv 2024

[54] [54]

Scott, K. (2024). Behind the Tech . https://www.microsoft.com/en-us/behind-the-tech

work page 2024

[55] [55]

Shao, Z., Gong, Y., Shen, Y., Huang, M., Duan, N., and Chen, W. (2023). Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv preprint arXiv:2305.15294

work page arXiv 2023

[56] [56]

A., Rey, B

Shin, J., Hedderich, M. A., Rey, B. J., Lucero, A., and Oulasvirta, A. (2024). Understanding human-ai workflows for generating personas. In Proceedings of the 2024 ACM Designing Interactive Systems Conference , pages 757--781

work page 2024

[57] [57]

Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and Yao, S. (2024). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems , 36

work page 2024

[58] [58]

B., Barezi, E

Su, D., Xu, Y., Yu, T., Siddique, F. B., Barezi, E. J., and Fung, P. (2020). Caire-covid: A question answering and query-focused multi-document summarization system for covid-19 scholarly information management. arXiv preprint arXiv:2005.03975

work page arXiv 2020

[59] [59]

Tan, Z., Zhao, X., and Wang, W. (2017). Representation learning of large-scale knowledge graphs via entity feature combinations. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management , CIKM '17, page 1777–1786, New York, NY, USA. Association for Computing Machinery

work page 2017

[60] [60]

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

Tang, Y. and Yang, Y. (2024). MultiHop-RAG : Benchmarking retrieval-augmented generation for multi-hop queries. arXiv preprint arXiv:2401.15391

work page internal anchor Pith review arXiv 2024

[61] [61]

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

work page internal anchor Pith review Pith/arXiv arXiv 2023

[62] [62]

A., Waltman, L., and Van Eck, N

Traag, V. A., Waltman, L., and Van Eck, N. J. (2019). From L ouvain to L eiden: guaranteeing well-connected communities. Scientific Reports , 9(1)

work page 2019

[63] [63]

Trajanoska, M., Stojanov, R., and Trajanov, D. (2023). Enhancing knowledge graph construction using large language models. ArXiv , abs/2305.04676

work page arXiv 2023

[64] [64]

Trivedi, H., Balasubramanian, N., Khot, T., and Sabharwal, A. (2022). Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509

work page internal anchor Pith review arXiv 2022

[65] [65]

Wang, J., Liang, Y., Meng, F., Sun, Z., Shi, H., Li, Z., Xu, J., Qu, J., and Zhou, J. (2023a). Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048

work page arXiv

[66] [66]

Wang, S., Khramtsova, E., Zhuang, S., and Zuccon, G. (2024). Feb4rag: Evaluating federated search in the context of retrieval augmented generation. arXiv preprint arXiv:2402.11891

work page arXiv 2024

[67] [67]

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., and Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171

work page internal anchor Pith review Pith/arXiv arXiv 2022

[68] [68]

A., Siu, A., Zhang, R., and Derr, T

Wang, Y., Lipka, N., Rossi, R. A., Siu, A., Zhang, R., and Derr, T. (2023b). Knowledge graph prompting for multi-document question answering

work page

[69] [69]

and Lapata, M

Xu, Y. and Lapata, M. (2021). Text summarization with latent queries. arXiv preprint arXiv:2106.00104

work page arXiv 2021

[70] [70]

W., Salakhutdinov, R., and Manning, C

Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., and Manning, C. D. (2018). HotpotQA : A dataset for diverse, explainable multi-hop question answering. In Conference on Empirical Methods in Natural Language Processing ( EMNLP )

work page 2018

[71] [71]

Yao, J.-g., Wan, X., and Xiao, J. (2017). Recent advances in document summarization. Knowledge and Information Systems , 53:297--336

work page 2017

[72] [72]

Yao, L., Peng, J., Mao, C., and Luo, Y. (2023). Exploring large language models for knowledge graph completion

work page 2023

[73] [73]

Yates, A., Banko, M., Broadhead, M., Cafarella, M., Etzioni, O., and Soderland, S. (2007). T ext R unner: Open information extraction on the web. In Carpenter, B., Stent, A., and Williams, J. D., editors, Proceedings of Human Language Technologies: The Annual Conference of the North A merican Chapter of the Association for Computational Linguistics ( NAAC...

work page 2007

[74] [74]

Yuan, X., Li, J., Wang, D., Chen, Y., Mao, X., Huang, L., Xue, H., Wang, W., Ren, K., and Wang, J. (2024). S-eval: Automatic and adaptive test generation for benchmarking safety evaluation of large language models. arXiv preprint arXiv:2405.14191

work page arXiv 2024

[75] [75]

Zhang, J. (2023). Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. arXiv preprint arXiv:2304.11116

work page arXiv 2023

[76] [76]

Zhang, Y., Zhang, Y., Gan, Y., Yao, L., and Wang, C. (2024a). Causal graph discovery with retrieval-augmented generation based large language models. arXiv preprint arXiv:2402.15301

work page arXiv

[77] [77]

Zhang, Z., Chen, J., and Yang, D. (2024b). Darg: Dynamic evaluation of large language models via adaptive reasoning graph. arXiv preprint arXiv:2406.17271

work page arXiv

[78] [78]

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. (2024). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems , 36

work page 2024

[79] [79]

Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., and Zhang, N. (2024). Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities

work page 2024