HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

Chulun Zhou; Chunkang Zhang; Fandong Meng; Guoxin Yu; Jie Zhou; Mo Yu; Wai Lam

arxiv: 2512.23959 · v3 · pith:NZCINKTKnew · submitted 2025-12-30 · 💻 cs.CL · cs.AI· cs.LG

HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

Chulun Zhou , Chunkang Zhang , Guoxin Yu , Fandong Meng , Jie Zhou , Wai Lam , Mo Yu This is my paper

classification 💻 cs.CL cs.AIcs.LG

keywords memoryreasoningglobalmulti-stepfactshgmemworkingcomplex

0 comments

read the original abstract

Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Although many RAG systems incorporate a working memory to consolidate information, existing designs primarily function as a passive storage for isolated facts. This static nature overlooks crucial high-order correlations among primitive facts, thereby limiting models' capacity for multi-step reasoning and resulting in fragmented reasoning and weak global sense-making within extended contexts. We introduce HGMem, a hypergraph-based working memory system, extending the concept of memory beyond simple storage into a dynamic, expressive structure for complex reasoning and global understanding. In our approach, memory is represented as a hypergraph where hyperedges correspond to distinct memory units, enabling the progressive formation of high-order interactions within memory. This mechanism connects facts and thoughts around the focal problem, evolving the memory into an integrated and situated knowledge structure that provides strong propositions for deeper reasoning. We evaluate HGMem on several challenging global sense-making benchmarks. Extensive experiments and in-depth analyses demonstrate that our method consistently improves multi-step RAG and substantially outperforms strong baseline systems across diverse datasets.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Judge Like Human Examiners: A Weighted Importance Multi-Point Evaluation Framework for Generative Tasks with Long-form Answers
cs.CL 2026-04 unverdicted novelty 6.0

WIMPE factorizes reference answers into weighted context-bound points and applies alignment (WPA) and conflict penalty (PCP) metrics, yielding higher human correlation than prior rubric or checklist methods across 10 ...