GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering

Andreas Marfurt; Tianyi Zhang

arxiv: 2604.04359 · v1 · submitted 2026-04-06 · 💻 cs.CL · cs.AI

GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering

Tianyi Zhang , Andreas Marfurt This is my paper

Pith reviewed 2026-05-10 20:17 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords retrieval-augmented generationknowledge graphslong-document question answeringsemantic role labelingabstract meaning representationgroundinghallucinationsNarrativeQA

0 comments

The pith

GroundedKG-RAG builds a source-linked knowledge graph from SRL and AMR parses to match proprietary long-context models on NarrativeQA at lower cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GroundedKG-RAG, a retrieval-augmented generation system for long-document question answering that constructs its knowledge graph directly from the source text rather than relying on LLM-generated descriptions. Nodes represent entities and actions while edges capture temporal or semantic relations, and every element is explicitly tied back to specific original sentences through semantic role labeling and abstract meaning representation parses. This grounding approach targets three problems in existing RAG systems: high resource use from repeated LLM processing, repetitive hierarchical content, and hallucinations from weak ties to the source. Evaluation on NarrativeQA examples shows the method performs on par with a state-of-the-art proprietary long-context model at reduced cost and outperforms a competitive baseline. The resulting graph remains human-readable, supporting direct auditing of retrieval and generation steps.

Core claim

GroundedKG-RAG defines nodes as entities and actions and edges as temporal or semantic relations, with each node and edge grounded in the original sentences of the source document. The graph is built from semantic role labeling and abstract meaning representation parses, then embedded for retrieval. During inference the same parsing step is applied to the query to retrieve the most relevant grounded sentences, which are passed to the generator for answering. On NarrativeQA the system matches a state-of-the-art proprietary long-context model at smaller cost, outperforms a competitive baseline, and supplies an interpretable structure that facilitates auditing and error analysis.

What carries the argument

GroundedKG: a knowledge graph whose nodes are entities and actions, edges are temporal or semantic relations, and every element is explicitly linked to specific sentences in the source document via SRL and AMR parses.

Load-bearing premise

Semantic role labeling and abstract meaning representation parses will reliably extract the key entities, actions, and relations without introducing errors or omissions that affect downstream retrieval and answer quality.

What would settle it

A controlled test on NarrativeQA examples that measures hallucination rate or factual accuracy when the same system is run once with sentence-grounded nodes and edges versus once with the same parses but without sentence-level grounding links.

read the original abstract

Retrieval-augmented generation (RAG) systems have been widely adopted in contemporary large language models (LLMs) due to their ability to improve generation quality while reducing the required input context length. In this work, we focus on RAG systems for long-document question answering. Current approaches suffer from a heavy reliance on LLM descriptions resulting in high resource consumption and latency, repetitive content across hierarchical levels, and hallucinations due to no or limited grounding in the source text. To improve both efficiency and factual accuracy through grounding, we propose GroundedKG-RAG, a RAG system in which the knowledge graph is explicitly extracted from and grounded in the source document. Specifically, we define nodes in GroundedKG as entities and actions, and edges as temporal or semantic relations, with each node and edge grounded in the original sentences. We construct GroundedKG from semantic role labeling (SRL) and abstract meaning representation (AMR) parses and then embed it for retrieval. During querying, we apply the same transformation to the query and retrieve the most relevant sentences from the grounded source text for question answering. We evaluate GroundedKG-RAG on examples from the NarrativeQA dataset and find that it performs on par with a state-of-the art proprietary long-context model at smaller cost and outperforms a competitive baseline. Additionally, our GroundedKG is interpretable and readable by humans, facilitating auditing of results and error analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GroundedKG-RAG builds a sentence-grounded KG from SRL and AMR parses for RAG on long documents and claims competitive QA results, but the evaluation provides little evidence that the parses are accurate enough to deliver the promised grounding benefits.

read the letter

The key point is that this system extracts a knowledge graph from the source document using semantic role labeling and abstract meaning representation, defines nodes as entities and actions with edges as relations, and keeps every element tied to specific sentences. Queries get the same treatment so retrieval pulls grounded sentences for the final answer step on NarrativeQA examples. They report parity with a proprietary long-context model at lower cost plus gains over a baseline, plus the graph being human-readable for auditing.

Referee Report

2 major / 2 minor

Summary. The paper proposes GroundedKG-RAG, a RAG system for long-document question answering that explicitly extracts a knowledge graph from the source document using semantic role labeling (SRL) and abstract meaning representation (AMR) parses. Nodes represent entities and actions, edges represent temporal or semantic relations, and all elements are grounded to original sentences. The KG is embedded for retrieval; queries undergo the same transformation to retrieve relevant grounded sentences for generation. Evaluation on NarrativeQA examples claims performance parity with a state-of-the-art proprietary long-context model at lower cost, outperformance of a competitive baseline, and human interpretability for auditing.

Significance. If the empirical results hold under rigorous validation, the approach could meaningfully advance grounded RAG for long documents by providing an interpretable, document-derived structured index that mitigates hallucinations and context-length issues. The use of off-the-shelf linguistic parsers for explicit grounding is a practical strength, and the emphasis on auditability addresses a common limitation in black-box retrieval systems.

major comments (2)

[Evaluation] Evaluation section: The central claim that GroundedKG-RAG performs on par with proprietary long-context models and outperforms baselines rests on the assumption that SRL/AMR parses produce a faithful, low-error KG. However, the manuscript provides no parse-quality metrics (e.g., SRL F1 or AMR Smatch scores on NarrativeQA documents), no human validation of node/edge fidelity or coverage of key narrative elements, and no ablation isolating grounding effects from retrieval heuristics or prompt design. SRL and AMR are known to omit implicit arguments and struggle with coreference in long narratives; without these controls, performance gains cannot be confidently attributed to grounding rather than other factors.
[Method] Method (KG construction): The description states that nodes and edges are 'grounded in the original sentences' via SRL and AMR, yet no details are given on how parse outputs are mapped to nodes/edges, how conflicts or incomplete parses are handled, or how the embedding and retrieval step preserves this grounding. This is load-bearing for the 'explicitly grounded' claim and the interpretability advantage.

minor comments (2)

[Method] The specific SRL and AMR parser implementations (e.g., AllenNLP, amrlib) and versions are not named, hindering reproducibility.
[Abstract and Evaluation] The abstract and evaluation description omit concrete metrics, baseline names, dataset splits, and statistical tests; these should be added with tables or figures for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our paper. We address each of the major comments below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Evaluation] The central claim that GroundedKG-RAG performs on par with proprietary long-context models and outperforms baselines rests on the assumption that SRL/AMR parses produce a faithful, low-error KG. However, the manuscript provides no parse-quality metrics (e.g., SRL F1 or AMR Smatch scores on NarrativeQA documents), no human validation of node/edge fidelity or coverage of key narrative elements, and no ablation isolating grounding effects from retrieval heuristics or prompt design. SRL and AMR are known to omit implicit arguments and struggle with coreference in long narratives; without these controls, performance gains cannot be confidently attributed to grounding rather than other factors.

Authors: We agree with the referee that additional validation of the parse quality is necessary to support our claims. In the revised version, we will include SRL F1 and AMR Smatch scores computed on the NarrativeQA documents. We will also add a human evaluation on a subset of the extracted KGs to assess fidelity and coverage. An ablation study will be included to separate the effects of grounding from other components. These additions will allow better attribution of performance improvements to the proposed grounding approach, despite the known limitations of the parsers. revision: yes
Referee: [Method] The description states that nodes and edges are 'grounded in the original sentences' via SRL and AMR, yet no details are given on how parse outputs are mapped to nodes/edges, how conflicts or incomplete parses are handled, or how the embedding and retrieval step preserves this grounding. This is load-bearing for the 'explicitly grounded' claim and the interpretability advantage.

Authors: We acknowledge that the Method section lacks sufficient implementation details. We will revise it to provide a step-by-step description of how SRL and AMR outputs are converted into nodes (entities and actions) and edges (temporal or semantic relations). This will include our strategies for resolving conflicts between parses, handling incomplete parses, and ensuring grounding by linking each element back to specific sentences. We will also detail the embedding process and retrieval mechanism to show how grounding is preserved for interpretability. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation or evaluation chain

full rationale

The manuscript presents a system architecture (GroundedKG-RAG) that extracts a knowledge graph via off-the-shelf SRL and AMR parsers, embeds it, and performs retrieval-augmented QA. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims rest on the empirical comparison to NarrativeQA baselines and a proprietary long-context model, which are independent of any self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that off-the-shelf SRL and AMR parsers produce sufficiently accurate and complete parses to ground the KG without loss of critical information.

axioms (1)

domain assumption Semantic role labeling and abstract meaning representation parses accurately capture entities, actions, and relations in the document.
Invoked to construct nodes and edges directly from source text.

invented entities (1)

GroundedKG no independent evidence
purpose: Provide an explicitly grounded index for retrieval that links back to original sentences.
New structure defined by the paper with nodes as entities/actions and edges as relations.

pith-pipeline@v0.9.0 · 5554 in / 1351 out tokens · 55427 ms · 2026-05-10T20:17:22.672009+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We construct GroundedKG from semantic role labeling (SRL) and abstract meaning representation (AMR) parses... nodes represent entities and actions, edges represent semantic relations
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We evaluate GroundedKG-RAG on examples from the NarrativeQA dataset and find that it performs on par with a state-of-the art proprietary long-context model

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · 1 internal anchor

[1]

GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering

Introduction Retrieval-Augmented Generation (RAG) is a tech- nique that retrieves and incorporates additional in- formation from an external knowledge source to answer user queries more accurately. Its applica- tion scenarios are diverse. For example, a chatbot may retrieve up-to-date information released after the model’s training cutoff to address users...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

watch"); textscontains textual mentions (e.g.,

Related Work Several recent approaches have explored ques- tion answering over long documents using retrieval- augmented generation (RAG). One line of work aims to extract node–edge pairs to build a graph in- dex for RAG systems. LightRAG (Guo et al., 2025), HippoRAG (Gutiérrez et al., 2024) and HippoR- AGv2 (Gutiérrez et al., 2025) create a knowledge gra...

work page 2025
[3]

Dataset NarrativeQA.NarrativeQA (Kočiský et al., 2017) is a large reading comprehension benchmark de- signedtoevaluatedeepunderstandingandreason- ing over long texts

Experiments 4.1. Dataset NarrativeQA.NarrativeQA (Kočiský et al., 2017) is a large reading comprehension benchmark de- signedtoevaluatedeepunderstandingandreason- ing over long texts. It consists of 1,572 full-length stories from books and movie scripts, the split for books are 548 train, 58 validation and 177 test. Annotators wrote 46,765 question–answer...

work page 2017
[4]

What did Peter do after arriving home?

Results We evaluate ourGroundedKG-RAGby comparing with different baselines and ablating our design choices. Baseline Comparison.Our main results are shown in Table 1. We compare ourGroundedKG with a closed-book LLM (without any context,no context), an open-book LLM (offering the full book content as context,full context), and GraphRAG. Since the Narrative...

work page
[5]

the four little kids’ father

Error Analysis We distinguish between (1) KG construction, (2) retrieval and (3) answer generation errors. The discussion of mapping“the four little kids’ father” to the father concept is an example of an error of the first kind (see Section 5). This works better in graphs built from AMR parses than from SRL parses. The second kind is a failure to retriev...

work page
[6]

it” has been resolved to “some camomile tea

Graph Case Study We present a simple example to illustrate the cre- ation of ourGroundedKGfrom SRL and AMR parses in Table 3. We can clearly observe that AMR-based graph grounded coreference-resolved entities (“it” has been resolved to “some camomile tea” in the second clause) into the same node, in this casetea, while SRL created a duplicate node for it....

work page
[7]

The nodes represent the entities and actions, and the edges represent the temporal or seman- tic relation between them

Conclusion In this work, we propose theGroundedKG- RAGframework, where each node and edge in theGroundedKGis directly extracted from and grounded in the source document at the sentence level. The nodes represent the entities and actions, and the edges represent the temporal or seman- tic relation between them. For theGroundedKG construction, we experiment...

work page
[8]

Acknowledgements We thank getAbstract for the support and partial funding of this project

work page
[9]

Bibliographical References Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, andJonathan Larson. 2025. Fromlocal toglobal: A graph rag approach to query-focused summa- rization. Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. 2025. LightRAG: Simple and fast re...

work page 2025
[10]

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D

The Proposition Bank: An annotated cor- pusofsemanticroles.ComputationalLinguistics, 31(1):71–106. Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Man- ning. 2024. Raptor: Recursive abstractive pro- cessing for tree-organized retrieval. InInterna- tional Conference on Learning Representations (ICLR). V.A. Traag, L....

work page 2024

[1] [1]

GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering

Introduction Retrieval-Augmented Generation (RAG) is a tech- nique that retrieves and incorporates additional in- formation from an external knowledge source to answer user queries more accurately. Its applica- tion scenarios are diverse. For example, a chatbot may retrieve up-to-date information released after the model’s training cutoff to address users...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

watch"); textscontains textual mentions (e.g.,

Related Work Several recent approaches have explored ques- tion answering over long documents using retrieval- augmented generation (RAG). One line of work aims to extract node–edge pairs to build a graph in- dex for RAG systems. LightRAG (Guo et al., 2025), HippoRAG (Gutiérrez et al., 2024) and HippoR- AGv2 (Gutiérrez et al., 2025) create a knowledge gra...

work page 2025

[3] [3]

Dataset NarrativeQA.NarrativeQA (Kočiský et al., 2017) is a large reading comprehension benchmark de- signedtoevaluatedeepunderstandingandreason- ing over long texts

Experiments 4.1. Dataset NarrativeQA.NarrativeQA (Kočiský et al., 2017) is a large reading comprehension benchmark de- signedtoevaluatedeepunderstandingandreason- ing over long texts. It consists of 1,572 full-length stories from books and movie scripts, the split for books are 548 train, 58 validation and 177 test. Annotators wrote 46,765 question–answer...

work page 2017

[4] [4]

What did Peter do after arriving home?

Results We evaluate ourGroundedKG-RAGby comparing with different baselines and ablating our design choices. Baseline Comparison.Our main results are shown in Table 1. We compare ourGroundedKG with a closed-book LLM (without any context,no context), an open-book LLM (offering the full book content as context,full context), and GraphRAG. Since the Narrative...

work page

[5] [5]

the four little kids’ father

Error Analysis We distinguish between (1) KG construction, (2) retrieval and (3) answer generation errors. The discussion of mapping“the four little kids’ father” to the father concept is an example of an error of the first kind (see Section 5). This works better in graphs built from AMR parses than from SRL parses. The second kind is a failure to retriev...

work page

[6] [6]

it” has been resolved to “some camomile tea

Graph Case Study We present a simple example to illustrate the cre- ation of ourGroundedKGfrom SRL and AMR parses in Table 3. We can clearly observe that AMR-based graph grounded coreference-resolved entities (“it” has been resolved to “some camomile tea” in the second clause) into the same node, in this casetea, while SRL created a duplicate node for it....

work page

[7] [7]

The nodes represent the entities and actions, and the edges represent the temporal or seman- tic relation between them

Conclusion In this work, we propose theGroundedKG- RAGframework, where each node and edge in theGroundedKGis directly extracted from and grounded in the source document at the sentence level. The nodes represent the entities and actions, and the edges represent the temporal or seman- tic relation between them. For theGroundedKG construction, we experiment...

work page

[8] [8]

Acknowledgements We thank getAbstract for the support and partial funding of this project

work page

[9] [9]

Bibliographical References Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, andJonathan Larson. 2025. Fromlocal toglobal: A graph rag approach to query-focused summa- rization. Zirui Guo, Lianghao Xia, Yanhua Yu, Tu Ao, and Chao Huang. 2025. LightRAG: Simple and fast re...

work page 2025

[10] [10]

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D

The Proposition Bank: An annotated cor- pusofsemanticroles.ComputationalLinguistics, 31(1):71–106. Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Man- ning. 2024. Raptor: Recursive abstractive pro- cessing for tree-organized retrieval. InInterna- tional Conference on Learning Representations (ICLR). V.A. Traag, L....

work page 2024