DualView: Adaptive Local-Global Fusion for Multi-Hop Document Reranking
Pith reviewed 2026-05-21 00:21 UTC · model grok-4.3
The pith
A dual-view reranker combines detailed query-document matches with inter-document relations through an adaptive gate to select minimal relevant sets for multi-hop questions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an efficient dual-view cascaded reranking framework, consisting of a Local Scorer with stacked cross-attention for query-document relevance and a Global Scorer with Transformer-based context aggregation for inter-document dependencies, fused dynamically by an Adaptive Gate conditioned on query semantics, delivers competitive recall and accuracy for multi-hop document selection at low latency in a fixed-candidate setting.
What carries the argument
The adaptive gate that weights the local cross-attention scores and the global context-aggregated scores according to query semantics.
If this is right
- The method identifies the smallest useful document set more accurately for questions that span multiple sources.
- Both the local relevance view and the global dependency view must be present to reach peak performance on multi-hop tasks.
- The approach functions as a fast post-retrieval step that preserves high recall while reducing processing time per query.
- Ablation results indicate that each view contributes distinct value that the other cannot supply alone.
Where Pith is reading between the lines
- The same adaptive blending of local and global signals could be tested on retrieval tasks that involve longer contexts or cross-modal items.
- One could check whether the gate mechanism improves results when the number of hops varies widely across queries.
- Extending the framework to update embeddings dynamically instead of relying on cached ones would test its robustness in changing retrieval environments.
Load-bearing premise
The reranking gains assume that the initial fixed pool of candidate documents is already representative enough for the fusion step to select the minimal relevant subset effectively.
What would settle it
Measure top-k recall on a multi-hop dataset after swapping the fixed candidate pool for one produced by a different initial retriever or generated on the fly without caching.
Figures
read the original abstract
Multi-hop question answering requires aggregating information from multiple documents, a critical capability for knowledge-intensive applications. A fundamental challenge lies in efficiently identifying the minimal relevant document set from retrieved candidates while maintaining high recall. We present an efficient dual-view cascaded reranking framework for multi-hop document reranking. Operating as a lightweight post-retrieval stage over E5-base-v2 candidates, our architecture comprises: (1) a Local Scorer employing stacked cross-attention for fine-grained query-document relevance; and (2) a Global Scorer modeling inter-document dependencies via Transformer-based context aggregation. These views are dynamically fused through an Adaptive Gate conditioned on query semantics. Under the fixed candidate set reranking setting with offline cached embeddings, our model achieves competitive results, particularly outstanding on MuSiQue with 99.4% Top-4 Recall and 97.8% Full Hit accuracy at 4.0 ms latency (249 QPS). It substantially outperforms 600M-parameter cross-encoders (BGE-Large: 92.0% Recall, Jina-v3: 90.1% Recall) while maintaining 5 to 6 times lower latency. Ablation studies validate that both Local and Global views contribute substantially to multi-hop performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces DualView, an efficient dual-view cascaded reranking framework for multi-hop document reranking. It operates as a lightweight post-retrieval stage over fixed candidates retrieved by E5-base-v2 with offline cached embeddings. The model comprises a Local Scorer using stacked cross-attention for fine-grained query-document relevance and a Global Scorer using Transformer-based context aggregation for inter-document dependencies; these are dynamically fused via an Adaptive Gate conditioned on query semantics. Under this fixed-candidate setting, the paper reports strong results on MuSiQue (99.4% Top-4 Recall, 97.8% Full Hit at 4.0 ms latency / 249 QPS), outperforming 600M-parameter cross-encoders such as BGE-Large while maintaining 5-6x lower latency. Ablation studies are cited to show that both local and global views contribute to multi-hop performance.
Significance. If the central performance claims hold after addressing the experimental gaps, the work offers a practical, low-latency reranking component for multi-hop QA pipelines that adaptively combines local and global signals. The emphasis on efficiency over large cross-encoders and the reported ablation results would constitute a useful engineering contribution to the cs.IR literature on retrieval augmentation for knowledge-intensive tasks.
major comments (1)
- [Abstract] Abstract (fixed candidate set reranking setting): The reported 99.4% Top-4 Recall and 97.8% Full Hit on MuSiQue are obtained by scoring and fusing only within a pre-retrieved pool of E5-base-v2 candidates. The manuscript provides no coverage statistics (e.g., average recall of gold documents inside the initial pool or comparison of E5 retrieval recall versus final DualView recall). Because the architecture cannot recover documents absent from this pool, the claimed gains over cross-encoders and the advantage of the adaptive local-global fusion cannot be isolated from the strength of the upstream retriever; this is load-bearing for the central performance claims.
minor comments (1)
- [Abstract] Abstract: No training details, exact loss functions, hyperparameter choices, or statistical significance tests are reported, leaving the numeric results difficult to reproduce or compare.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for highlighting an important aspect of our experimental setting. We address the major comment below and will revise the manuscript to provide greater clarity on the role of the upstream retriever.
read point-by-point responses
-
Referee: [Abstract] Abstract (fixed candidate set reranking setting): The reported 99.4% Top-4 Recall and 97.8% Full Hit on MuSiQue are obtained by scoring and fusing only within a pre-retrieved pool of E5-base-v2 candidates. The manuscript provides no coverage statistics (e.g., average recall of gold documents inside the initial pool or comparison of E5 retrieval recall versus final DualView recall). Because the architecture cannot recover documents absent from this pool, the claimed gains over cross-encoders and the advantage of the adaptive local-global fusion cannot be isolated from the strength of the upstream retriever; this is load-bearing for the central performance claims.
Authors: We agree that reporting coverage statistics for the initial candidate pool would strengthen the presentation and help readers better contextualize the results. Our work specifically targets the post-retrieval reranking stage over a fixed set of candidates (a practical and widely used setting for low-latency multi-hop QA), rather than end-to-end retrieval. In the revised manuscript we will add the requested statistics, including the average recall of gold documents in the E5-base-v2 pool and a direct comparison of E5 retrieval recall versus DualView recall on MuSiQue. This will more clearly isolate the contribution of the adaptive local-global fusion. The reported comparisons with cross-encoders (BGE-Large, Jina-v3) are performed under the identical fixed-candidate protocol, ensuring that the efficiency and accuracy advantages reflect the reranker itself rather than differences in retrieval coverage. Ablation results further support the value of the dual-view design within this controlled setting. revision: yes
Circularity Check
No circularity in derivation chain; claims are empirical evaluations
full rationale
The paper presents a dual-view cascaded reranking architecture with local cross-attention scorer, global Transformer aggregator, and adaptive gate fusion, all described as an engineering design. Reported metrics (e.g., 99.4% Top-4 Recall on MuSiQue) are direct experimental results under a fixed E5-base-v2 candidate pool with offline embeddings. No equations, parameter fits, or self-citations are invoked to derive performance by construction; ablations simply confirm component contributions on standard benchmarks. The framework is self-contained against external retrieval and reranking baselines.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2024. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. InInternational Conference on Learning Representations
work page 2024
-
[2]
Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi- Granularity Text Embeddings Through Self-Knowledge Distillation.arXiv preprint arXiv:2402.03216(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
- [3]
-
[4]
Xilong Chen et al . 2020. PARADE: Passage Representation Aggregation for Document Reranking. InProceedings of the 28th International Conference on Computational Linguistics. 5609–5620
work page 2020
- [5]
-
[6]
Xilong Chen et al. 2024. TourRank: Utilizing Large Language Models for Docu- ments Ranking with a Tournament-Inspired Strategy. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
work page 2024
- [7]
-
[8]
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. REALM: Retrieval-Augmented Language Model Pre-Training. InInternational Conference on Machine Learning. PMLR, 3929–3938
work page 2020
-
[9]
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reason- ing Steps. InProceedings of the 28th International Conference on Computational Linguistics. 6609–6625
work page 2020
-
[10]
Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering.arXiv preprint arXiv:2007.01282(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[11]
Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qin. 2023. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 13358–13376
work page 2023
- [12]
- [13]
- [14]
-
[15]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data7, 3 (2019), 535–547
work page 2019
-
[16]
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 6769–6781
work page 2020
-
[17]
Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christo- pher Potts, and Matei Zaharia. 2022. Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval. InAdvances in Neural Information Processing Systems, Vol. 35. 37645–37659
work page 2022
-
[18]
Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48
work page 2020
-
[19]
Jie Lan, Jiaqi Chen, Zhengyi Liu, Chao Li, Siqian Bao, and Defu Lian. 2025. Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval.arXiv preprint DualView: Adaptive Local-Global Fusion for Multi-Hop Document Reranking CIKM ’26, November 7–11, 2026, Rome, Italy arXiv:2509.24869(2025)
- [20]
- [21]
- [22]
-
[23]
Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023. REPLUG: Retrieval-Augmented Black-Box Language Models.arXiv preprint arXiv:2301.12652(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [24]
-
[25]
Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investi- gating Large Language Models as Re-Ranking Agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 14918–14937
work page 2023
-
[27]
Transactions of the Association for Computational Linguistics10 (2022), 539–554
MuSiQue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics10 (2022), 539–554
work page 2022
-
[28]
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal
-
[29]
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge- Intensive Multi-Step Questions.arXiv preprint arXiv:2212.10509(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
- [30]
-
[31]
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Zhang, Rangan Majumder, and Furu Wei. 2022. Introducing E5: A Family of Embedding Models.arXiv preprint arXiv:2212.03533(2022)
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[32]
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre- Trained Transformers. InAdvances in Neural Information Processing Systems, Vol. 33. 5776–5788
work page 2020
-
[33]
Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, and Benjamin Van Durme. 2025. Rank1: Test-Time Compute for Reranking in Infor- mation Retrieval.arXiv preprint arXiv:2502.18418(2025). arXiv:2502.18418 [cs.IR] doi:10.48550/arXiv.2502.18418
-
[34]
Lee Xiong, Ye Xiong, Ye Li, Kwok-Fung Tang, Jialin Wang, Jamie Callan, and Zhicheng Bai. 2021. Approximate Nearest Neighbor Negative Contrastive Learn- ing for Dense Text Retrieval. InInternational Conference on Learning Representa- tions
work page 2021
- [35]
-
[36]
Zhilin Yang et al. 2016. Inter-document Contextual Language Model. InProceed- ings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 950–960
work page 2016
-
[37]
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2369–2380
work page 2018
- [38]
-
[39]
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.arXiv preprint arXiv:2506.05176(2025). arXiv:2506.05176 [cs.CL] Tongyi Lab, Alibaba Group
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.