arxiv: 2604.16350 · v1 · submitted 2026-03-16 · 💻 cs.IR · cs.AI

Recognition: no theorem link

LiteSemRAG: Lightweight LLM-Free Semantic-Aware Graph Retrieval for Robust RAG

Xiao Yue , Guangzhi Qu , Lige Gan

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:45 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords graph RAGLLM-free retrievalsemantic graphpolysemy handlingcontextual embeddingsmean reciprocal rankinformation retrievaltoken-level graph construction

0 comments

The pith

LiteSemRAG builds a heterogeneous semantic graph from contextual token embeddings alone and achieves the best MRR@10 across benchmarks with zero LLM token consumption.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LiteSemRAG as a fully LLM-free framework for graph-based retrieval in RAG systems. It constructs a heterogeneous semantic graph using contextual token-level embeddings that separate surface lexical forms from context-dependent meanings, then applies dynamic semantic node construction with chunk-level aggregation and anomaly handling to address polysemy. A two-step query process combines co-occurrence graph weighting with isolated semantic recovery to balance structural and semantic signals. Experiments on three benchmark datasets show it reaches the highest mean reciprocal rank while remaining competitive on recall against LLM-dependent alternatives. This matters because it removes the token costs, latency, and compute overhead that currently limit scalable deployment of graph RAG methods.

Core claim

LiteSemRAG constructs a heterogeneous semantic graph by exploiting contextual token-level embeddings, explicitly separating surface lexical representations from context-dependent semantic meanings. It introduces a dynamic semantic node construction mechanism with chunk-level context aggregation and adaptive anomaly handling to model polysemy. At query time it performs a two-step semantic-aware retrieval that integrates co-occurrence graph weighting with an isolated semantic recovery mechanism. On three benchmark datasets this produces the best MRR@10 scores and competitive or superior Recall@10 compared with state-of-the-art LLM-based graph RAG systems, all while consuming zero LLM tokens.

What carries the argument

Heterogeneous semantic graph built from contextual token-level embeddings together with dynamic node construction for polysemy handling and a two-step retrieval process of co-occurrence weighting plus semantic recovery.

If this is right

Zero LLM token consumption occurs in both indexing and querying stages.
Substantial efficiency gains appear in indexing time and query latency relative to LLM-dependent systems.
Best-in-class MRR@10 holds across all three evaluated benchmark datasets.
Recall@10 remains competitive or superior to current LLM-based graph RAG methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could support RAG on devices where running large language models for retrieval is infeasible.
Hybrid pipelines might use LiteSemRAG for the retrieval stage and reserve any LLM only for final answer generation.
The explicit lexical-versus-semantic node separation could transfer to other embedding-driven tasks such as document clustering.
Scaling experiments on larger corpora would test whether the anomaly-handling step remains effective without additional supervision.

Load-bearing premise

Contextual token-level embeddings by themselves can produce a heterogeneous semantic graph that handles polysemy and balances structural and semantic retrieval without any LLM processing.

What would settle it

A polysemy-rich test set on which LiteSemRAG records substantially lower MRR@10 than the strongest LLM-based graph RAG baseline would falsify the claim that embeddings alone suffice.

Figures

Figures reproduced from arXiv: 2604.16350 by Guangzhi Qu, Lige Gan, Xiao Yue.

**Figure 2.** Figure 2: Example of the multi-layer heterogeneous semantic graph index We define a multi-layer heterogeneous semantic graph as G = (V, E), where the node set is defined as V = D ∪ C ∪ S ∪ T. The node types are defined as: – Document nodes (D): each represents a document in the corpus. – Chunk nodes (C): each represents a text chunk derived from a document. – Semantic nodes (S): each represents a specific semantic m… view at source ↗

**Figure 3.** Figure 3: Embedding distribution of strong contextual information (left), weak contextual information (middle), and after aggregation (right) To mitigate this issue, we introduce a chunk-level context aggregation strategy prior to clustering. Specifically, for token t, suppose its contextual semantic embeddings are Et = {e1, e2, . . . , en}, where each embedding ei is associated with a particular chunk. If the init… view at source ↗

read the original abstract

Graph-based Retrieval-Augmented Generation (RAG) has shown great potential for improving multi-level reasoning and structured evidence aggregation. However, existing graph-based RAG frameworks heavily rely on exploiting large language models (LLMs) during indexing and querying, leading to high token consumption, computational cost and latency overhead. In this paper, we propose LiteSemRAG, a lightweight, fully LLM-free, semantic-aware graph retrieval framework. LiteSemRAG constructs a heterogeneous semantic graph by exploiting contextual token-level embeddings, explicitly separating surface lexical representations from context-dependent semantic meanings. To robustly model polysemy, we introduce a dynamic semantic node construction mechanism with chunk-level context aggregation and adaptive anomaly handling. At query stage, LiteSemRAG performs a two-step semantic-aware retrieval process that integrates co-occurrence graph weighting with an isolated semantic recovery mechanism, enabling balanced structural reasoning and semantic coverage. We evaluate LiteSemRAG on three benchmark datasets and experimental results show that LiteSemRAG achieves the best mean reciprocal rank (MRR@10) across all datasets and competitive or superior recall rate (Recall@10) compared to state-of-the-art LLM-based graph RAG systems. Meanwhile, LiteSemRAG consumes zero LLM tokens and achieves substantial efficiency improvements in both indexing and querying due to the elimination of LLM usage. These results demonstrate the effectiveness of LiteSemRAG, indicating that a strong semantic-aware graph retrieval framework can be achieved without relying on LLM-based approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes LiteSemRAG, a lightweight LLM-free semantic-aware graph retrieval framework for RAG. It constructs a heterogeneous semantic graph from contextual token-level embeddings, introduces a dynamic semantic node construction mechanism with chunk-level aggregation and adaptive anomaly handling to model polysemy, and performs a two-step retrieval process integrating co-occurrence graph weighting with isolated semantic recovery. On three benchmark datasets, it claims the best MRR@10 across all datasets and competitive or superior Recall@10 versus state-of-the-art LLM-based graph RAG systems, while consuming zero LLM tokens and achieving efficiency gains in indexing and querying.

Significance. If the performance claims are supported by detailed, reproducible experiments, the work would be significant for showing that strong semantic-aware graph retrieval in RAG is achievable without any LLM involvement. This could reduce token costs, latency, and computational overhead while maintaining or improving retrieval quality, offering a practical alternative to LLM-heavy graph RAG approaches.

major comments (2)

[Abstract] Abstract: The central claims of best MRR@10 across all datasets and competitive Recall@10 are stated without any experimental details, baseline descriptions, statistical tests, error bars, or dataset names, leaving the performance superiority unsupported by visible evidence and making the result impossible to assess or reproduce.
[Method] Method description: The dynamic semantic node construction with chunk-level context aggregation and adaptive anomaly handling is presented at a high level with no equations, pseudocode, or formal definition of how surface lexical forms are separated from context-dependent senses; this mechanism is load-bearing for the polysemy-handling claim but remains unformalized and unablated.

minor comments (2)

[Abstract] The abstract refers to 'three benchmark datasets' without naming them or providing any statistics on their characteristics.
Notation such as MRR@10 and Recall@10 should be explicitly defined on first use for readers unfamiliar with IR metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and outline the revisions we will make to improve the manuscript's clarity, formality, and reproducibility.

read point-by-point responses

Referee: [Abstract] Abstract: The central claims of best MRR@10 across all datasets and competitive Recall@10 are stated without any experimental details, baseline descriptions, statistical tests, error bars, or dataset names, leaving the performance superiority unsupported by visible evidence and making the result impossible to assess or reproduce.

Authors: We agree that the abstract, due to length constraints, omits specific dataset names and quantitative details. The full manuscript (Section 4) reports results on three standard benchmarks with explicit MRR@10 and Recall@10 values against named LLM-based baselines. In the revision we will expand the abstract to name the datasets and cite the key MRR@10 figures while preserving brevity; we will also add a short reproducibility note. No error bars appear because retrieval is deterministic given fixed embeddings, but we will state this explicitly. revision: partial
Referee: [Method] Method description: The dynamic semantic node construction with chunk-level context aggregation and adaptive anomaly handling is presented at a high level with no equations, pseudocode, or formal definition of how surface lexical forms are separated from context-dependent senses; this mechanism is load-bearing for the polysemy-handling claim but remains unformalized and unablated.

Authors: We accept that the current description is high-level. The revised manuscript will include: (i) formal equations defining the separation of surface lexical nodes from context-dependent semantic nodes via chunk-level aggregation, (ii) pseudocode for the dynamic node construction and adaptive anomaly handling steps, and (iii) an ablation study isolating the contribution of this mechanism to polysemy robustness. These additions will be placed in Section 3. revision: yes

Circularity Check

0 steps flagged

No circularity: performance claims rest on empirical benchmarks without self-referential derivations or fitted predictions.

full rationale

The paper presents LiteSemRAG as an architectural system that builds a heterogeneous semantic graph from contextual token-level embeddings, applies dynamic node construction with chunk aggregation, and performs two-step retrieval via co-occurrence weighting plus isolated semantic recovery. No equations, derivations, or parameter-fitting steps are described that would reduce the reported MRR@10 or Recall@10 gains to inputs by construction. Results are framed as experimental outcomes on three benchmark datasets, with zero LLM tokens as a direct consequence of the LLM-free design rather than a renamed fit. No self-citations appear as load-bearing premises, and the core claims remain externally falsifiable via the stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Central claim rests on the unverified premise that token embeddings suffice for semantic graph construction without LLM context.

axioms (1)

domain assumption Contextual token embeddings capture semantic distinctions independently of large language models
Framework uses these embeddings to build nodes and edges without further LLM processing.

pith-pipeline@v0.9.0 · 5564 in / 1038 out tokens · 34685 ms · 2026-05-15T09:45:05.785982+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 3 internal anchors

[1]

language models by retrieving from trillions of tokens

Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., Van Den Driessche, G.B., Lespiau, J.B., Damoc, B., Clark, A., et al.: Improving 16 X.Yue et al. language models by retrieving from trillions of tokens. In: International conference on machine learning. pp. 2206–2240. PMLR (2022)

work page 2022
[2]

Advances in neural information processing systems33, 1877–1901 (2020)

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Nee- lakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Advances in neural information processing systems33, 1877–1901 (2020)

work page 1901
[3]

In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidi- rectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). pp. 4171–4186 (2019)

work page 2019
[4]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitansky, D., Ness, R.O., Larson, J.: From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Guo, Z., Xia, L., Yu, Y., Ao, T., Huang, C.: Lightrag: Simple and fast retrieval- augmented generation. arXiv preprint arXiv:2410.057792(3) (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Advances in neural information processing systems37, 59532–59569 (2024)

Gutiérrez, B.J., Shu, Y., Gu, Y., Yasunaga, M., Su, Y.: Hipporag: Neurobiolog- ically inspired long-term memory for large language models. Advances in neural information processing systems37, 59532–59569 (2024)

work page 2024
[7]

In: International conference on machine learning

Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: International conference on machine learning. pp. 3929–

work page
[8]

In: Proceedings of the 16th conference of the european chapter of the association for computational linguistics: main volume

Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering. In: Proceedings of the 16th conference of the european chapter of the association for computational linguistics: main volume. pp. 874–880 (2021)

work page 2021
[9]

In: Proceed- ings of the 2020 conference on empirical methods in natural language processing (EMNLP)

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., Yih, W.t.: Dense passage retrieval for open-domain question answering. In: Proceed- ings of the 2020 conference on empirical methods in natural language processing (EMNLP). pp. 6769–6781 (2020)

work page 2020
[10]

In: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval

Khattab, O., Zaharia, M.: Colbert: Efficient and effective passage search via con- textualized late interaction over bert. In: Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. pp. 39–48 (2020)

work page 2020
[11]

Advances in neural information processing systems 33, 9459–9474 (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33, 9459–9474 (2020)

work page 2020
[12]

In: The Twelfth International Conference on Learning Representations (2024)

Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., Manning, C.D.: Rap- tor: Recursive abstractive processing for tree-organized retrieval. In: The Twelfth International Conference on Learning Representations (2024)

work page 2024
[13]

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Thakur, N., Reimers, N., Rücklé, A., Srivastava, A., Gurevych, I.: Beir: A heteroge- nous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663 (2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[14]

In: EMNLP (2020)

Wadden, D., Lin, S., Lo, K., Wang, L.L., van Zuylen, M., Cohan, A., Hajishirzi, H.: Fact or fiction: Verifying scientific claims. In: EMNLP (2020)

work page 2020
[15]

In: Proceedings of the 2018 conference on empirical methods in natural language processing

Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., Manning, C.D.: Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 conference on empirical methods in natural language processing. pp. 2369–2380 (2018)

work page 2018
[16]

Yu, C., Su, F.: Nollmrag: Llm-free makes graph-based rag highly efficient, effective and generalizable Title Suppressed Due to Excessive Length 17

work page
[17]

Data Science and Engineering pp

Zhao,P.,Zhang,H.,Yu,Q.,Wang,Z.,Geng,Y.,Fu,F.,Yang,L.,Zhang,W.,Jiang, J., Cui, B.: Retrieval-augmented generation for ai-generated content: A survey. Data Science and Engineering pp. 1–29 (2026)

work page 2026