pith. sign in

arxiv: 2605.18767 · v1 · pith:M54CU2WXnew · submitted 2026-04-13 · 💻 cs.IR · cs.AI

DualView: Adaptive Local-Global Fusion for Multi-Hop Document Reranking

Pith reviewed 2026-05-21 00:21 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords multi-hop document rerankingadaptive fusionlocal-global modelinginformation retrievalquestion answeringtransformer aggregation
0
0 comments X

The pith

A dual-view reranker combines detailed query-document matches with inter-document relations through an adaptive gate to select minimal relevant sets for multi-hop questions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that a lightweight architecture can improve multi-hop document reranking by running a local scorer that checks fine-grained relevance for each document and a global scorer that models how documents depend on one another. These two outputs are blended by a gate whose behavior changes with the query itself. If the approach holds, systems that answer questions needing facts from several sources could retrieve the right small subset of documents more reliably while using far less time per query than current large models.

Core claim

The central claim is that an efficient dual-view cascaded reranking framework, consisting of a Local Scorer with stacked cross-attention for query-document relevance and a Global Scorer with Transformer-based context aggregation for inter-document dependencies, fused dynamically by an Adaptive Gate conditioned on query semantics, delivers competitive recall and accuracy for multi-hop document selection at low latency in a fixed-candidate setting.

What carries the argument

The adaptive gate that weights the local cross-attention scores and the global context-aggregated scores according to query semantics.

If this is right

  • The method identifies the smallest useful document set more accurately for questions that span multiple sources.
  • Both the local relevance view and the global dependency view must be present to reach peak performance on multi-hop tasks.
  • The approach functions as a fast post-retrieval step that preserves high recall while reducing processing time per query.
  • Ablation results indicate that each view contributes distinct value that the other cannot supply alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adaptive blending of local and global signals could be tested on retrieval tasks that involve longer contexts or cross-modal items.
  • One could check whether the gate mechanism improves results when the number of hops varies widely across queries.
  • Extending the framework to update embeddings dynamically instead of relying on cached ones would test its robustness in changing retrieval environments.

Load-bearing premise

The reranking gains assume that the initial fixed pool of candidate documents is already representative enough for the fusion step to select the minimal relevant subset effectively.

What would settle it

Measure top-k recall on a multi-hop dataset after swapping the fixed candidate pool for one produced by a different initial retriever or generated on the fly without caching.

Figures

Figures reproduced from arXiv: 2605.18767 by Jiaxin Li, Kuo Zhao, Litong Zhang.

Figure 1
Figure 1. Figure 1: Overview of our lightweight cascaded reranker. Given a query and candidate documents encoded by a frozen E5 encoder, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the Local and Global Scorers. The [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Adaptive Gate architecture for dynamic view fu [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Multi-hop question answering requires aggregating information from multiple documents, a critical capability for knowledge-intensive applications. A fundamental challenge lies in efficiently identifying the minimal relevant document set from retrieved candidates while maintaining high recall. We present an efficient dual-view cascaded reranking framework for multi-hop document reranking. Operating as a lightweight post-retrieval stage over E5-base-v2 candidates, our architecture comprises: (1) a Local Scorer employing stacked cross-attention for fine-grained query-document relevance; and (2) a Global Scorer modeling inter-document dependencies via Transformer-based context aggregation. These views are dynamically fused through an Adaptive Gate conditioned on query semantics. Under the fixed candidate set reranking setting with offline cached embeddings, our model achieves competitive results, particularly outstanding on MuSiQue with 99.4% Top-4 Recall and 97.8% Full Hit accuracy at 4.0 ms latency (249 QPS). It substantially outperforms 600M-parameter cross-encoders (BGE-Large: 92.0% Recall, Jina-v3: 90.1% Recall) while maintaining 5 to 6 times lower latency. Ablation studies validate that both Local and Global views contribute substantially to multi-hop performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces DualView, an efficient dual-view cascaded reranking framework for multi-hop document reranking. It operates as a lightweight post-retrieval stage over fixed candidates retrieved by E5-base-v2 with offline cached embeddings. The model comprises a Local Scorer using stacked cross-attention for fine-grained query-document relevance and a Global Scorer using Transformer-based context aggregation for inter-document dependencies; these are dynamically fused via an Adaptive Gate conditioned on query semantics. Under this fixed-candidate setting, the paper reports strong results on MuSiQue (99.4% Top-4 Recall, 97.8% Full Hit at 4.0 ms latency / 249 QPS), outperforming 600M-parameter cross-encoders such as BGE-Large while maintaining 5-6x lower latency. Ablation studies are cited to show that both local and global views contribute to multi-hop performance.

Significance. If the central performance claims hold after addressing the experimental gaps, the work offers a practical, low-latency reranking component for multi-hop QA pipelines that adaptively combines local and global signals. The emphasis on efficiency over large cross-encoders and the reported ablation results would constitute a useful engineering contribution to the cs.IR literature on retrieval augmentation for knowledge-intensive tasks.

major comments (1)
  1. [Abstract] Abstract (fixed candidate set reranking setting): The reported 99.4% Top-4 Recall and 97.8% Full Hit on MuSiQue are obtained by scoring and fusing only within a pre-retrieved pool of E5-base-v2 candidates. The manuscript provides no coverage statistics (e.g., average recall of gold documents inside the initial pool or comparison of E5 retrieval recall versus final DualView recall). Because the architecture cannot recover documents absent from this pool, the claimed gains over cross-encoders and the advantage of the adaptive local-global fusion cannot be isolated from the strength of the upstream retriever; this is load-bearing for the central performance claims.
minor comments (1)
  1. [Abstract] Abstract: No training details, exact loss functions, hyperparameter choices, or statistical significance tests are reported, leaving the numeric results difficult to reproduce or compare.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting an important aspect of our experimental setting. We address the major comment below and will revise the manuscript to provide greater clarity on the role of the upstream retriever.

read point-by-point responses
  1. Referee: [Abstract] Abstract (fixed candidate set reranking setting): The reported 99.4% Top-4 Recall and 97.8% Full Hit on MuSiQue are obtained by scoring and fusing only within a pre-retrieved pool of E5-base-v2 candidates. The manuscript provides no coverage statistics (e.g., average recall of gold documents inside the initial pool or comparison of E5 retrieval recall versus final DualView recall). Because the architecture cannot recover documents absent from this pool, the claimed gains over cross-encoders and the advantage of the adaptive local-global fusion cannot be isolated from the strength of the upstream retriever; this is load-bearing for the central performance claims.

    Authors: We agree that reporting coverage statistics for the initial candidate pool would strengthen the presentation and help readers better contextualize the results. Our work specifically targets the post-retrieval reranking stage over a fixed set of candidates (a practical and widely used setting for low-latency multi-hop QA), rather than end-to-end retrieval. In the revised manuscript we will add the requested statistics, including the average recall of gold documents in the E5-base-v2 pool and a direct comparison of E5 retrieval recall versus DualView recall on MuSiQue. This will more clearly isolate the contribution of the adaptive local-global fusion. The reported comparisons with cross-encoders (BGE-Large, Jina-v3) are performed under the identical fixed-candidate protocol, ensuring that the efficiency and accuracy advantages reflect the reranker itself rather than differences in retrieval coverage. Ablation results further support the value of the dual-view design within this controlled setting. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain; claims are empirical evaluations

full rationale

The paper presents a dual-view cascaded reranking architecture with local cross-attention scorer, global Transformer aggregator, and adaptive gate fusion, all described as an engineering design. Reported metrics (e.g., 99.4% Top-4 Recall on MuSiQue) are direct experimental results under a fixed E5-base-v2 candidate pool with offline embeddings. No equations, parameter fits, or self-citations are invoked to derive performance by construction; ablations simply confirm component contributions on standard benchmarks. The framework is self-contained against external retrieval and reranking baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no explicit free parameters, axioms, or invented entities can be extracted; the model implicitly relies on standard attention and Transformer components from prior literature.

pith-pipeline@v0.9.0 · 5754 in / 1316 out tokens · 60457 ms · 2026-05-21T00:21:20.104499+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 6 internal anchors

  1. [1]

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. 2024. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. InInternational Conference on Learning Representations

  2. [2]

    Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi- Granularity Text Embeddings Through Self-Knowledge Distillation.arXiv preprint arXiv:2402.03216(2024)

  3. [3]

    Jian Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. BGE Reranker: Enhancing Retrieval with Cross-Encoder Fine-tuning.arXiv preprint arXiv:2403.15081(2024)

  4. [4]

    Xilong Chen et al . 2020. PARADE: Passage Representation Aggregation for Document Reranking. InProceedings of the 28th International Conference on Computational Linguistics. 5609–5620

  5. [5]

    Xilong Chen et al. 2024. FIRST: Faster Improved Listwise Reranking with Single Token Decoding.arXiv preprint arXiv:2406.15657(2024)

  6. [6]

    Xilong Chen et al. 2024. TourRank: Utilizing Large Language Models for Docu- ments Ranking with a Tournament-Inspired Strategy. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

  7. [7]

    Yixiong Fang, Tianran Sun, Yuling Shi, and Xiaodong Gu. 2025. AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation.arXiv preprint arXiv:2503.10720(2025)

  8. [8]

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. REALM: Retrieval-Augmented Language Model Pre-Training. InInternational Conference on Machine Learning. PMLR, 3929–3938

  9. [9]

    Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reason- ing Steps. InProceedings of the 28th International Conference on Computational Linguistics. 6609–6625

  10. [10]

    Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering.arXiv preprint arXiv:2007.01282(2021)

  11. [11]

    Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qin. 2023. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 13358–13376

  12. [12]

    Ziyan Jiang, Xueguang Ma, and Wenhu Chen. 2024. LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs.arXiv preprint arXiv:2406.15319(2024)

  13. [13]

    Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. 2023. FLARE: Active Retrieval Augmented Generation.arXiv preprint arXiv:2305.06983(2023)

  14. [14]

    Jina AI Team. 2025. Jina Reranker v3: Last but Not Late Interaction for Listwise Document Reranking.arXiv preprint arXiv:2509.25085(2025)

  15. [15]

    Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-Scale Similarity Search with GPUs.IEEE Transactions on Big Data7, 3 (2019), 535–547

  16. [16]

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense Passage Retrieval for Open- Domain Question Answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 6769–6781

  17. [17]

    Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christo- pher Potts, and Matei Zaharia. 2022. Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval. InAdvances in Neural Information Processing Systems, Vol. 35. 37645–37659

  18. [18]

    Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 39–48

  19. [19]

    Jie Lan, Jiaqi Chen, Zhengyi Liu, Chao Li, Siqian Bao, and Defu Lian. 2025. Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval.arXiv preprint DualView: Adaptive Local-Global Fusion for Multi-Hop Document Reranking CIKM ’26, November 7–11, 2026, Rome, Italy arXiv:2509.24869(2025)

  20. [20]

    Yibin Lei, Tao Shen, and Andrew Yates. 2025. ThinkQE: Query Expansion via an Evolving Thinking Process.arXiv preprint arXiv:2506.09260(2025)

  21. [21]

    Md Mahadi Hasan Nahid and Davood Rafiei. 2025. PRISM: Agentic Retrieval with LLMs for Multi-Hop Question Answering.arXiv preprint arXiv:2510.14278 (2025). arXiv:2510.14278 [cs.IR]

  22. [22]

    Freda Shi, Xinyun Chen, Kensen Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schucher, Honglak Le, and Tenghao Zhou. 2023. Large Language Mod- els Can Be Easily Distracted by Irrelevant Context.arXiv preprint arXiv:2302.00093 (2023)

  23. [23]

    Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen-tau Yih. 2023. REPLUG: Retrieval-Augmented Black-Box Language Models.arXiv preprint arXiv:2301.12652(2023)

  24. [24]

    Yixin Su, Chuzhan Xiao, Yijia Gao, Youjia Lin, Wenbiao Liu, Fan Zhou, Shiyu Chang, Jingbo Zhou, Tong Zhang, et al . 2024. DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Need of LLMs.arXiv preprint arXiv:2401.04119(2024)

  25. [25]

    Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investi- gating Large Language Models as Re-Ranking Agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 14918–14937

  26. [27]

    Transactions of the Association for Computational Linguistics10 (2022), 539–554

    MuSiQue: Multihop Questions via Single-hop Question Composition. Transactions of the Association for Computational Linguistics10 (2022), 539–554

  27. [28]

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal

  28. [29]

    Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge- Intensive Multi-Step Questions.arXiv preprint arXiv:2212.10509(2023)

  29. [30]

    Liang Wang et al. 2024. Self-Calibrated Listwise Reranking with Large Language Models.arXiv preprint arXiv:2411.04602(2024)

  30. [31]

    Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Zhang, Rangan Majumder, and Furu Wei. 2022. Introducing E5: A Family of Embedding Models.arXiv preprint arXiv:2212.03533(2022)

  31. [32]

    Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre- Trained Transformers. InAdvances in Neural Information Processing Systems, Vol. 33. 5776–5788

  32. [33]

    Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, and Benjamin Van Durme. 2025. Rank1: Test-Time Compute for Reranking in Infor- mation Retrieval.arXiv preprint arXiv:2502.18418(2025). arXiv:2502.18418 [cs.IR] doi:10.48550/arXiv.2502.18418

  33. [34]

    Lee Xiong, Ye Xiong, Ye Li, Kwok-Fung Tang, Jialin Wang, Jamie Callan, and Zhicheng Bai. 2021. Approximate Nearest Neighbor Negative Contrastive Learn- ing for Dense Text Retrieval. InInternational Conference on Learning Representa- tions

  34. [35]

    Eugene Yang, Andrew Yates, Kathryn Ricci, Orion Weller, Vivek Chari, Benjamin Van Durme, and Dawn Lawrie. 2025. Rank-K: Test-Time Reasoning for Listwise Reranking.arXiv preprint arXiv:2505.14432(2025). arXiv:2505.14432 [cs.IR]

  35. [36]

    Zhilin Yang et al. 2016. Inter-document Contextual Language Model. InProceed- ings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 950–960

  36. [37]

    Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2369–2380

  37. [38]

    Hongjin Yu et al. 2024. Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models.arXiv preprint arXiv:2406.19215(2024)

  38. [39]

    Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025. Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models.arXiv preprint arXiv:2506.05176(2025). arXiv:2506.05176 [cs.CL] Tongyi Lab, Alibaba Group