pith. sign in

arxiv: 2605.28641 · v2 · pith:GDHPY2XMnew · submitted 2026-05-27 · 💻 cs.IR

Subtraction Gets You More: Gap-Aware Retrieval for Multimodal Multi-Hop QA

classification 💻 cs.IR
keywords retrievalevidencegap-awaregrailembeddingimplicititerativemulti-hop
0
0 comments X
read the original abstract

In multimodal multi-hop question answering, we focus on the initial retrieval stage via two distinct tasks: (1) evidence set completion, retrieving missing evidence given context, and (2) sequential pool construction, iteratively building the top-$K$ pool from the scratch. Under these settings, we point out that conventional iterative retrieval frameworks often suffer from Semantic Anchoring, where previously fetched evidence traps the retriever and yields entity-centric redundancy. To break this trap, we propose GRAIL (Gap-aware Retrieval via Adaptive Implicit Localization), a paradigm that performs implicit query rewriting directly at the embedding level. By context-subtractive query steering, GRAIL excels at compositional cross-modal reasoning, while additive embedding updates show strength on localized information aggregation. By dynamically routing queries based on task type, our Hybrid Framework achieves a 40.3% macro-averaged performance gain on MultimodalQA. Extensive evaluations demonstrate that sequential GRAIL retrieves in a superior, noise-resilient manner, significantly expanding the search horizon through iterative gap-aware optimization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.