ACGM learns task-adaptive sparse graphs over multi-modal agent histories via policy-gradient optimization, reaching 82.7 nDCG@10 and 89.2% Precision@10 on WebShop, VisualWebArena, and Mind2Web while outperforming 19 baselines.
arXiv preprint arXiv:2507.20804 (2025),https: //arxiv.org/abs/2507.20804
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.IR 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
MG²-RAG proposes a multi-granularity graph RAG framework that constructs hierarchical multimodal nodes via entity-driven visual grounding and performs structured retrieval, delivering SOTA results on four multimodal tasks with 43.3× faster graph construction.
citing papers explorer
-
Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory
ACGM learns task-adaptive sparse graphs over multi-modal agent histories via policy-gradient optimization, reaching 82.7 nDCG@10 and 89.2% Precision@10 on WebShop, VisualWebArena, and Mind2Web while outperforming 19 baselines.
-
MG$^2$-RAG: Multi-Granularity Graph for Multimodal Retrieval-Augmented Generation
MG²-RAG proposes a multi-granularity graph RAG framework that constructs hierarchical multimodal nodes via entity-driven visual grounding and performs structured retrieval, delivering SOTA results on four multimodal tasks with 43.3× faster graph construction.