pith. machine review for the scientific record. sign in

arxiv: 2512.20626 · v2 · submitted 2025-11-26 · 💻 cs.AI · cs.CL· cs.CV· cs.IR

Recognition: unknown

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation

Authors on Pith no claims yet
classification 💻 cs.AI cs.CLcs.CVcs.IR
keywords knowledgereasoninggenerationmultimodalvisualansweringcontentcues
0
0 comments X
read the original abstract

Retrieval-augmented generation (RAG) enables large language models (LLMs) to dynamically access external information, which is powerful for answering questions over previously unseen documents. Nonetheless, they struggle with high-level conceptual understanding and holistic comprehension due to limited context windows, which constrain their ability to perform deep reasoning over long-form, domain-specific content such as full-length books. To solve this problem, knowledge graphs (KGs) have been leveraged to provide entity-centric structure and hierarchical summaries, offering more structured support for reasoning. However, existing KG-based RAG solutions remain restricted to text-only inputs and fail to leverage the complementary insights provided by other modalities such as vision. On the other hand, reasoning from visual documents requires textual, visual, and spatial cues into structured, hierarchical concepts. To address this issue, we introduce a multimodal knowledge graph-based RAG that enables cross-modal reasoning for better content understanding. Our method incorporates visual cues into the construction of knowledge graphs, the retrieval phase, and the answer generation process. Experimental results across both global and fine-grained question answering tasks show that our approach consistently outperforms existing RAG-based approaches on both textual and multimodal corpora.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios

    cs.AI 2026-05 unverdicted novelty 6.0

    Event-Causal RAG segments videos into events represented as SES graphs, merges them into a causal knowledge graph, and uses bidirectional retrieval to supply relevant event chains to a video foundation model for impro...

  2. Geometry-Aware Localized Watermarking for Copyright Protection in Embedding-as-a-Service

    cs.CR 2026-04 unverdicted novelty 6.0

    GeoMark decouples local watermark triggering from centralized ownership attribution using geometry-separated anchors and adaptive neighborhoods to improve robustness against paraphrasing, dimension changes, and cluste...