uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

Kidist Amde Mekonnen; Mohammad Aliannejadi; Simon Lupart; Zahra Abbasiantaeb

arxiv: 2606.11945 · v1 · pith:C6WOUVP4new · submitted 2026-06-10 · 💻 cs.CL · cs.IR

uva-irlab-conv at SemEval-2026 Task 8: Multi-Turn RAG with Learned Sparse Retrieval and Listwise Reranking

Simon Lupart , Kidist Amde Mekonnen , Zahra Abbasiantaeb , Mohammad Aliannejadi This is my paper

Pith reviewed 2026-06-27 10:11 UTC · model grok-4.3

classification 💻 cs.CL cs.IR

keywords multi-turn retrievalretrieval-augmented generationlearned sparse retrievallistwise rerankingconversational question answeringunanswerable queriesSemEval task

0 comments

The pith

A multi-turn RAG pipeline uses learned sparse retrieval and LLM listwise reranking to integrate full conversation history across four domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a system for SemEval-2026 Task 8 that performs multi-turn retrieval and question answering over collections in finance, cloud documentation, government, and Wikipedia. It relies on learned sparse retrieval to fetch evidence without domain-specific tuning and uses LLMs to rewrite queries, perform pointwise and listwise reranking, and generate answers, with every step conditioned on the complete conversational history. The design explicitly addresses unanswerable queries where the collection lacks sufficient evidence. The central goal is to show that this staged integration of context produces more robust results than single-step approaches.

Core claim

The multi-step design enables effective integration of conversational context throughout retrieval and generation, improving robustness across domains.

What carries the argument

Multi-turn retrieval-augmented generation pipeline that applies learned sparse retrieval first, then LLM-based query rewriting, pointwise and listwise reranking, and final generation, each conditioned on full conversational history.

If this is right

Sparse retrieval serves as the primary method because it generalizes without per-domain training.
LLM long-context handling allows rewriting, reranking, and generation to use the entire conversation history at once.
The pipeline can identify unanswerable queries by checking whether retrieved evidence is sufficient.
Listwise reranking selects better passages than retrieval scores alone for the generation step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged pipeline could be tested on other conversational retrieval benchmarks that include unanswerable questions.
Removing any single LLM step (rewriting or listwise reranking) and measuring the drop would isolate which component drives the claimed robustness.
The approach leaves open whether the same gains appear when the underlying LLM is smaller or when retrieval is restricted to shorter contexts.

Load-bearing premise

Learned sparse retrieval generalizes strongly across the four domains without domain-specific adaptation and LLM listwise reranking measurably improves end-to-end performance.

What would settle it

Measurements on the task test set showing that a domain-adapted dense retriever or a simpler pointwise reranker produces higher final answer quality than the reported pipeline on at least two of the four domains.

Figures

Figures reproduced from arXiv: 2606.11945 by Kidist Amde Mekonnen, Mohammad Aliannejadi, Simon Lupart, Zahra Abbasiantaeb.

**Figure 2.** Figure 2: Response generation performance for different [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Retrieval Performance at varying depths. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Response Generation at varying depths [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

This report describes our participation in SemEval-2026 Task 8 on multi-turn retrieval and question answering. The task evaluates conversational systems across four domains (finance, cloud documentation, government, Wikipedia), and includes unanswerable queries where the available collection does not contain sufficient evidence to produce a complete response. We propose a multi-turn retrieval-augmented generation pipeline that combines learned sparse retrieval with LLM-based reranking and generation. Using sparse retrieval as the primary retrieval method, we leverage its strong generalization across domains. In addition, we make use of the long-context capabilities of LLMs for conversational query rewriting, pointwise and listwise reranking, and generating the final response, each conditioned on the full conversational history. This multi-step design enables effective integration of conversational context throughout retrieval and generation, improving robustness across domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A standard SemEval system report that strings together existing RAG pieces but supplies no metrics, ablations, or comparisons to back its robustness claims.

read the letter

This is a participation report for SemEval-2026 Task 8 on multi-turn retrieval and QA. The system runs learned sparse retrieval first, then uses LLMs for conversational query rewriting, pointwise and listwise reranking, and final generation, all conditioned on full history. It targets four domains and flags unanswerable queries.

The pipeline itself is assembled from routine components. Sparse retrieval for cross-domain work and LLM listwise reranking are both established moves; nothing here derives a new framework or shows a first-principles improvement. The paper does a clear job describing how the pieces fit together for the shared task.

The real limitation is the complete absence of results. No retrieval scores, no end-to-end numbers, no ablation on the reranking step, and no baseline or per-domain breakdown appear. The assertion that the multi-step design improves robustness across domains therefore stays untested. The stress-test note is right on this point.

People already working on the same SemEval task or building similar conversational systems might skim it for implementation details. Anyone looking for new methods or validated gains will find little. It does not rise to the level that needs referee time.

Referee Report

2 major / 2 minor

Summary. The paper describes the uva-irlab-conv participation in SemEval-2026 Task 8 on multi-turn retrieval-augmented generation and QA. It proposes a pipeline that uses learned sparse retrieval as the primary retriever, combined with LLM-based conversational query rewriting, pointwise and listwise reranking, and final answer generation, all conditioned on full conversational history, and claims that this design enables effective context integration and improves robustness across the four evaluation domains (finance, cloud documentation, government, Wikipedia) while handling unanswerable queries.

Significance. If the claimed robustness gains were demonstrated through evaluation, the work would provide a concrete example of combining sparse retrieval generalization with LLM context handling for conversational QA; however, the complete absence of any metrics, ablations, or comparisons leaves the significance of the design choices unevaluated.

major comments (2)

[Abstract] Abstract: the assertion that the multi-step design 'improves robustness across domains' is load-bearing for the paper's contribution yet is unsupported by any retrieval metrics (e.g., nDCG, recall), ablation results, baseline comparisons (dense retrieval, single-turn systems), or per-domain breakdowns, rendering the generalization and improvement claims unevaluable.
[The manuscript as a whole] The manuscript provides no experimental section or results table reporting official task scores, comparison against other participants, or analysis of the contribution of listwise reranking versus pointwise reranking or full-history conditioning.

minor comments (2)

[Abstract] The description of the four domains and the unanswerable-query handling would be clearer if accompanied by concrete examples of query rewriting or reranking prompts.
Standard SemEval system papers typically include the team's official ranking and primary metric values; their omission here weakens the report's utility to the shared-task community.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the feedback on our system description paper. We address the major comments point by point below, noting that this is a concise participation report for a shared task.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the multi-step design 'improves robustness across domains' is load-bearing for the paper's contribution yet is unsupported by any retrieval metrics (e.g., nDCG, recall), ablation results, baseline comparisons (dense retrieval, single-turn systems), or per-domain breakdowns, rendering the generalization and improvement claims unevaluable.

Authors: We agree that the abstract makes an unsupported claim about robustness improvements. The statement was intended to reflect the design rationale—leveraging learned sparse retrieval for cross-domain generalization and LLM conditioning on full history—but no quantitative evidence is provided in the manuscript. We will revise the abstract to remove this claim and describe the pipeline components without asserting empirical gains. revision: yes
Referee: [The manuscript as a whole] The manuscript provides no experimental section or results table reporting official task scores, comparison against other participants, or analysis of the contribution of listwise reranking versus pointwise reranking or full-history conditioning.

Authors: This manuscript is a system description focused on the pipeline architecture rather than a full experimental study. Official task scores are aggregated in the SemEval task overview rather than individual reports, and we did not run the requested ablations or comparisons during participation. We will add a brief results section reporting any available official scores in revision, but component-level analysis is not available from our work. revision: partial

standing simulated objections not resolved

The manuscript contains no experimental results, metrics, ablations, or comparisons, which prevents providing the requested evidence or analysis.

Circularity Check

0 steps flagged

No circularity; purely descriptive system report with no derivations or fitted predictions

full rationale

The paper is a participation report for a SemEval shared task. It describes a retrieval-augmented generation pipeline using learned sparse retrieval, LLM query rewriting, pointwise/listwise reranking, and response generation, all conditioned on conversational history. No equations, parameters, derivations, or quantitative predictions appear in the provided text. Claims about generalization and robustness are presented as design motivations rather than results derived from prior steps within the paper. No self-citations, ansatzes, or renamings reduce any claim to its own inputs by construction. The work is self-contained as an engineering description.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model, free parameters, axioms, or invented entities are present; the paper is an applied system description for a shared task.

pith-pipeline@v0.9.1-grok · 5695 in / 1187 out tokens · 27119 ms · 2026-06-27T10:11:54.611056+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 19 canonical work pages

[1]

Proceedings of the 34th Text REtrieval Conference (TREC 2025)(NIST SP xxxx)

UvAIRLab at iKAT25: Exploring Learned Sparse Retrieval and Query Rewriting for Personalized Conversational QA , author=. Proceedings of the 34th Text REtrieval Conference (TREC 2025)(NIST SP xxxx). Gaithersburg, Maryland , year=

2025
[2]

Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions , author=. Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers) , pages=
[3]

The Twelfth International Conference on Learning Representations , year=

Self-rag: Learning to retrieve, generate, and critique through self-reflection , author=. The Twelfth International Conference on Learning Representations , year=
[4]

The eleventh international conference on learning representations , year=

React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=
[5]

Bruce Croft, Erik Learned-Miller, and Jaap Kamps

Zamani, Hamed and Dehghani, Mostafa and Croft, W. Bruce and Learned-Miller, Erik and Kamps, Jaap , title =. Proceedings of the 27th ACM International Conference on Information and Knowledge Management , pages =. 2018 , isbn =. doi:10.1145/3269206.3271800 , abstract =

work page doi:10.1145/3269206.3271800 2018
[6]

Companion proceedings of the the web conference 2018 , pages=

Www'18 open challenge: financial opinion mining and question answering , author=. Companion proceedings of the the web conference 2018 , pages=

2018
[7]

Transactions of the Association for Computational Linguistics , volume=

CLAPnq: C ohesive L ong-form A nswers from P assages in Natural Questions for RAG systems , author=. Transactions of the Association for Computational Linguistics , volume=. 2025 , publisher=

2025
[8]

Proceedings of the 14th ACM international conference on web search and data mining , pages=

Question rewriting for conversational question answering , author=. Proceedings of the 14th ACM international conference on web search and data mining , pages=
[9]

Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

Is ChatGPT good at search? investigating large language models as re-ranking agents , author=. Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

2023
[10]

Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=

Conversational information seeking: Theory and application , author=. Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=
[11]

Transactions of the Association for Computational Linguistics , volume=

Evaluating correctness and faithfulness of instruction-following models for question answering , author=. Transactions of the Association for Computational Linguistics , volume=. 2024 , publisher=

2024
[12]

RAD-Bench: Evaluating large language models’ capabilities in retrieval augmented dialogues , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track) , pages=

2025
[13]

IR Evaluation Methods for Retrieving Highly Relevant Documents , booktitle =

J\". IR Evaluation Methods for Retrieving Highly Relevant Documents , booktitle =. 2000 , isbn =. doi:10.1145/345508.345545 , acmid =

work page doi:10.1145/345508.345545 2000
[14]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Efficient inverted indexes for approximate retrieval over learned sparse representations , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=
[15]

Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and others , journal=
[16]

2025 , institution=

How people use ChatGPT , author=. 2025 , institution=

2025
[17]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 Technical Report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv
[18]

arXiv preprint arXiv:2601.13115 , year=

Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning , author=. arXiv preprint arXiv:2601.13115 , year=

Pith/arXiv arXiv
[19]

U ni C onv: Unifying Retrieval and Response Generation for Large Language Models in Conversations

Mo, Fengran and Gao, Yifan and Meng, Chuan and Liu, Xin and Wu, Zhuofeng and Mao, Kelong and Wang, Zhengyang and Chen, Pei and Li, Zheng and Li, Xian and Yin, Bing and Jiang, Meng. U ni C onv: Unifying Retrieval and Response Generation for Large Language Models in Conversations. Proceedings of the 63rd Annual Meeting of the Association for Computational L...

work page doi:10.18653/v1/2025.acl-long.344 2025
[20]

arXiv preprint arXiv:2510.13312 , year=

Chatr1: Reinforcement learning for conversational reasoning and retrieval augmented question answering , author=. arXiv preprint arXiv:2510.13312 , year=

Pith/arXiv arXiv
[21]

Investigating LLM Variability in Personalized Conversational Information Retrieval , year =

Lupart, Simon and van Dijk, Dani\". Investigating LLM Variability in Personalized Conversational Information Retrieval , year =. Proceedings of the 2025 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region , pages =. doi:10.1145/3767695.3769502 , abstract =

work page doi:10.1145/3767695.3769502 2025
[22]

Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III , pages =

Nguyen, Thong and MacAvaney, Sean and Yates, Andrew , title =. Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III , pages =. 2023 , isbn =. doi:10.1007/978-3-031-28241-6_7 , abstract =

work page doi:10.1007/978-3-031-28241-6_7 2023
[23]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Zeng, Hansi and Killingback, Julian and Zamani, Hamed , title =. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2025 , isbn =. doi:10.1145/3726302.3730225 , abstract =

work page doi:10.1145/3726302.3730225 2025
[24]

2022 , isbn =

Formal, Thibault and Lassance, Carlos and Piwowarski, Benjamin and Clinchant, St\'. From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective , year =. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. doi:10.1145/3477495.3531857 , abstract =

work page doi:10.1145/3477495.3531857
[25]

arXiv preprint arXiv:2312.10997 , volume=

Retrieval-augmented generation for large language models: A survey , author=. arXiv preprint arXiv:2312.10997 , volume=

Pith/arXiv arXiv
[26]

Can You Unpack That? Learning to Rewrite Questions-in-Context

Elgohary, Ahmed and Peskov, Denis and Boyd-Graber, Jordan. Can You Unpack That? Learning to Rewrite Questions-in-Context. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1605

work page doi:10.18653/v1/d19-1605 2019
[27]

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Yu, Shi and Liu, Zhenghao and Xiong, Chenyan and Feng, Tao and Liu, Zhiyuan , title =. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2021 , isbn =. doi:10.1145/3404835.3462856 , abstract =

work page doi:10.1145/3404835.3462856 2021
[28]

Embracing Plasticity: Balancing Stability and Plasticity in Continual Recommender Systems

Lupart, Simon and Aliannejadi, Mohammad and Kanoulas, Evangelos , title =. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2025 , isbn =. doi:10.1145/3726302.3729966 , abstract =

work page doi:10.1145/3726302.3729966 2025
[29]

arXiv preprint arXiv:2411.14739 , year=

IRLab@ iKAT24: Learned Sparse Retrieval with Multi-aspect LLM Query Generation for Conversational Search , author=. arXiv preprint arXiv:2411.14739 , year=

arXiv
[30]

arXiv preprint arXiv:2403.19302 , year=

Generating Multi-Aspect Queries for Conversational Search , author=. arXiv preprint arXiv:2403.19302 , year=

arXiv
[31]

arXiv preprint arXiv:2406.05013 , year=

CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search , author=. arXiv preprint arXiv:2406.05013 , year=

arXiv
[32]

Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

Mao, Kelong and Dou, Zhicheng and Mo, Fengran and Hou, Jiewen and Chen, Haonan and Qian, Hongjin. Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.86

work page doi:10.18653/v1/2023.findings-emnlp.86 2023
[33]

RAGA s: Automated Evaluation of Retrieval Augmented Generation

Es, Shahul and James, Jithin and Espinosa Anke, Luis and Schockaert, Steven. RAGA s: Automated Evaluation of Retrieval Augmented Generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. 2024. doi:10.18653/v1/2024.eacl-demo.16

work page doi:10.18653/v1/2024.eacl-demo.16 2024
[34]

Open-Domain Question Answering Goes Conversational via Question Rewriting

Anantha, Raviteja and Vakulenko, Svitlana and Tu, Zhucheng and Longpre, Shayne and Pulman, Stephen and Chappidi, Srinivas. Open-Domain Question Answering Goes Conversational via Question Rewriting. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18...

work page doi:10.18653/v1/2021.naacl-main.44 2021
[35]

T opi OCQA : Open-domain Conversational Question Answering with Topic Switching

Adlakha, Vaibhav and Dhuliawala, Shehzaad and Suleman, Kaheer and de Vries, Harm and Reddy, Siva. T opi OCQA : Open-domain Conversational Question Answering with Topic Switching. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00471

work page doi:10.1162/tacl_a_00471 2022
[36]

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Formal, Thibault and Piwowarski, Benjamin and Clinchant, St\'. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2021 , isbn =. doi:10.1145/3404835.3463098 , abstract =

work page doi:10.1145/3404835.3463098 2021
[37]

Text Retrieval Conference , year=

CAsT 2020: The Conversational Assistance Track Overview , author=. Text Retrieval Conference , year=

2020
[38]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Aliannejadi, Mohammad and Abbasiantaeb, Zahra and Chatterjee, Shubham and Dalton, Jeffrey and Azzopardi, Leif , title =. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2024 , isbn =. doi:10.1145/3626772.3657860 , abstract =

work page doi:10.1145/3626772.3657860 2024
[39]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Abbasiantaeb, Zahra and Lupart, Simon and Azzopardi, Leif and Dalton, Jeffrey and Aliannejadi, Mohammad , title =. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2025 , isbn =. doi:10.1145/3726302.3730316 , abstract =

work page doi:10.1145/3726302.3730316 2025
[40]

SIGIR Forum , volume=

User simulation in practice: Lessons learned from three shared tasks , author=. SIGIR Forum , volume=
[41]

2026 , url=

Philippe Laban and Hiroaki Hayashi and Yingbo Zhou and Jennifer Neville , booktitle=. 2026 , url=

2026
[42]

Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval , pages =

Radlinski, Filip and Craswell, Nick , title =. Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval , pages =. 2017 , isbn =. doi:10.1145/3020165.3020183 , abstract =

work page doi:10.1145/3020165.3020183 2017
[43]

2026 , eprint=

MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations , author=. 2026 , eprint=

2026
[44]

Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026) , address=

SemEval-2026 Task 8: MTRAGEval: Evaluating Multi-Turn RAG Conversations , author=. Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026) , address=. 2026 , organization=

2026
[45]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten , title =. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2025 , isbn =. doi:10.1145/3726302.3730023 , abstract =

work page doi:10.1145/3726302.3730023 2025

[1] [1]

Proceedings of the 34th Text REtrieval Conference (TREC 2025)(NIST SP xxxx)

UvAIRLab at iKAT25: Exploring Learned Sparse Retrieval and Query Rewriting for Personalized Conversational QA , author=. Proceedings of the 34th Text REtrieval Conference (TREC 2025)(NIST SP xxxx). Gaithersburg, Maryland , year=

2025

[2] [2]

Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions , author=. Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers) , pages=

[3] [3]

The Twelfth International Conference on Learning Representations , year=

Self-rag: Learning to retrieve, generate, and critique through self-reflection , author=. The Twelfth International Conference on Learning Representations , year=

[4] [4]

The eleventh international conference on learning representations , year=

React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=

[5] [5]

Bruce Croft, Erik Learned-Miller, and Jaap Kamps

Zamani, Hamed and Dehghani, Mostafa and Croft, W. Bruce and Learned-Miller, Erik and Kamps, Jaap , title =. Proceedings of the 27th ACM International Conference on Information and Knowledge Management , pages =. 2018 , isbn =. doi:10.1145/3269206.3271800 , abstract =

work page doi:10.1145/3269206.3271800 2018

[6] [6]

Companion proceedings of the the web conference 2018 , pages=

Www'18 open challenge: financial opinion mining and question answering , author=. Companion proceedings of the the web conference 2018 , pages=

2018

[7] [7]

Transactions of the Association for Computational Linguistics , volume=

CLAPnq: C ohesive L ong-form A nswers from P assages in Natural Questions for RAG systems , author=. Transactions of the Association for Computational Linguistics , volume=. 2025 , publisher=

2025

[8] [8]

Proceedings of the 14th ACM international conference on web search and data mining , pages=

Question rewriting for conversational question answering , author=. Proceedings of the 14th ACM international conference on web search and data mining , pages=

[9] [9]

Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

Is ChatGPT good at search? investigating large language models as re-ranking agents , author=. Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

2023

[10] [10]

Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=

Conversational information seeking: Theory and application , author=. Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=

[11] [11]

Transactions of the Association for Computational Linguistics , volume=

Evaluating correctness and faithfulness of instruction-following models for question answering , author=. Transactions of the Association for Computational Linguistics , volume=. 2024 , publisher=

2024

[12] [12]

RAD-Bench: Evaluating large language models’ capabilities in retrieval augmented dialogues , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track) , pages=

2025

[13] [13]

IR Evaluation Methods for Retrieving Highly Relevant Documents , booktitle =

J\". IR Evaluation Methods for Retrieving Highly Relevant Documents , booktitle =. 2000 , isbn =. doi:10.1145/345508.345545 , acmid =

work page doi:10.1145/345508.345545 2000

[14] [14]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

Efficient inverted indexes for approximate retrieval over learned sparse representations , author=. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages=

[15] [15]

Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and others , journal=

[16] [16]

2025 , institution=

How people use ChatGPT , author=. 2025 , institution=

2025

[17] [17]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 Technical Report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv

[18] [18]

arXiv preprint arXiv:2601.13115 , year=

Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning , author=. arXiv preprint arXiv:2601.13115 , year=

Pith/arXiv arXiv

[19] [19]

U ni C onv: Unifying Retrieval and Response Generation for Large Language Models in Conversations

Mo, Fengran and Gao, Yifan and Meng, Chuan and Liu, Xin and Wu, Zhuofeng and Mao, Kelong and Wang, Zhengyang and Chen, Pei and Li, Zheng and Li, Xian and Yin, Bing and Jiang, Meng. U ni C onv: Unifying Retrieval and Response Generation for Large Language Models in Conversations. Proceedings of the 63rd Annual Meeting of the Association for Computational L...

work page doi:10.18653/v1/2025.acl-long.344 2025

[20] [20]

arXiv preprint arXiv:2510.13312 , year=

Chatr1: Reinforcement learning for conversational reasoning and retrieval augmented question answering , author=. arXiv preprint arXiv:2510.13312 , year=

Pith/arXiv arXiv

[21] [21]

Investigating LLM Variability in Personalized Conversational Information Retrieval , year =

Lupart, Simon and van Dijk, Dani\". Investigating LLM Variability in Personalized Conversational Information Retrieval , year =. Proceedings of the 2025 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region , pages =. doi:10.1145/3767695.3769502 , abstract =

work page doi:10.1145/3767695.3769502 2025

[22] [22]

Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III , pages =

Nguyen, Thong and MacAvaney, Sean and Yates, Andrew , title =. Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III , pages =. 2023 , isbn =. doi:10.1007/978-3-031-28241-6_7 , abstract =

work page doi:10.1007/978-3-031-28241-6_7 2023

[23] [23]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Zeng, Hansi and Killingback, Julian and Zamani, Hamed , title =. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2025 , isbn =. doi:10.1145/3726302.3730225 , abstract =

work page doi:10.1145/3726302.3730225 2025

[24] [24]

2022 , isbn =

Formal, Thibault and Lassance, Carlos and Piwowarski, Benjamin and Clinchant, St\'. From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective , year =. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. doi:10.1145/3477495.3531857 , abstract =

work page doi:10.1145/3477495.3531857

[25] [25]

arXiv preprint arXiv:2312.10997 , volume=

Retrieval-augmented generation for large language models: A survey , author=. arXiv preprint arXiv:2312.10997 , volume=

Pith/arXiv arXiv

[26] [26]

Can You Unpack That? Learning to Rewrite Questions-in-Context

Elgohary, Ahmed and Peskov, Denis and Boyd-Graber, Jordan. Can You Unpack That? Learning to Rewrite Questions-in-Context. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1605

work page doi:10.18653/v1/d19-1605 2019

[27] [27]

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Yu, Shi and Liu, Zhenghao and Xiong, Chenyan and Feng, Tao and Liu, Zhiyuan , title =. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2021 , isbn =. doi:10.1145/3404835.3462856 , abstract =

work page doi:10.1145/3404835.3462856 2021

[28] [28]

Embracing Plasticity: Balancing Stability and Plasticity in Continual Recommender Systems

Lupart, Simon and Aliannejadi, Mohammad and Kanoulas, Evangelos , title =. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2025 , isbn =. doi:10.1145/3726302.3729966 , abstract =

work page doi:10.1145/3726302.3729966 2025

[29] [29]

arXiv preprint arXiv:2411.14739 , year=

IRLab@ iKAT24: Learned Sparse Retrieval with Multi-aspect LLM Query Generation for Conversational Search , author=. arXiv preprint arXiv:2411.14739 , year=

arXiv

[30] [30]

arXiv preprint arXiv:2403.19302 , year=

Generating Multi-Aspect Queries for Conversational Search , author=. arXiv preprint arXiv:2403.19302 , year=

arXiv

[31] [31]

arXiv preprint arXiv:2406.05013 , year=

CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search , author=. arXiv preprint arXiv:2406.05013 , year=

arXiv

[32] [32]

Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search

Mao, Kelong and Dou, Zhicheng and Mo, Fengran and Hou, Jiewen and Chen, Haonan and Qian, Hongjin. Large Language Models Know Your Contextual Search Intent: A Prompting Framework for Conversational Search. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.86

work page doi:10.18653/v1/2023.findings-emnlp.86 2023

[33] [33]

RAGA s: Automated Evaluation of Retrieval Augmented Generation

Es, Shahul and James, Jithin and Espinosa Anke, Luis and Schockaert, Steven. RAGA s: Automated Evaluation of Retrieval Augmented Generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. 2024. doi:10.18653/v1/2024.eacl-demo.16

work page doi:10.18653/v1/2024.eacl-demo.16 2024

[34] [34]

Open-Domain Question Answering Goes Conversational via Question Rewriting

Anantha, Raviteja and Vakulenko, Svitlana and Tu, Zhucheng and Longpre, Shayne and Pulman, Stephen and Chappidi, Srinivas. Open-Domain Question Answering Goes Conversational via Question Rewriting. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18...

work page doi:10.18653/v1/2021.naacl-main.44 2021

[35] [35]

T opi OCQA : Open-domain Conversational Question Answering with Topic Switching

Adlakha, Vaibhav and Dhuliawala, Shehzaad and Suleman, Kaheer and de Vries, Harm and Reddy, Siva. T opi OCQA : Open-domain Conversational Question Answering with Topic Switching. Transactions of the Association for Computational Linguistics. 2022. doi:10.1162/tacl_a_00471

work page doi:10.1162/tacl_a_00471 2022

[36] [36]

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Formal, Thibault and Piwowarski, Benjamin and Clinchant, St\'. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2021 , isbn =. doi:10.1145/3404835.3463098 , abstract =

work page doi:10.1145/3404835.3463098 2021

[37] [37]

Text Retrieval Conference , year=

CAsT 2020: The Conversational Assistance Track Overview , author=. Text Retrieval Conference , year=

2020

[38] [38]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Aliannejadi, Mohammad and Abbasiantaeb, Zahra and Chatterjee, Shubham and Dalton, Jeffrey and Azzopardi, Leif , title =. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2024 , isbn =. doi:10.1145/3626772.3657860 , abstract =

work page doi:10.1145/3626772.3657860 2024

[39] [39]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Abbasiantaeb, Zahra and Lupart, Simon and Azzopardi, Leif and Dalton, Jeffrey and Aliannejadi, Mohammad , title =. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2025 , isbn =. doi:10.1145/3726302.3730316 , abstract =

work page doi:10.1145/3726302.3730316 2025

[40] [40]

SIGIR Forum , volume=

User simulation in practice: Lessons learned from three shared tasks , author=. SIGIR Forum , volume=

[41] [41]

2026 , url=

Philippe Laban and Hiroaki Hayashi and Yingbo Zhou and Jennifer Neville , booktitle=. 2026 , url=

2026

[42] [42]

Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval , pages =

Radlinski, Filip and Craswell, Nick , title =. Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval , pages =. 2017 , isbn =. doi:10.1145/3020165.3020183 , abstract =

work page doi:10.1145/3020165.3020183 2017

[43] [43]

2026 , eprint=

MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations , author=. 2026 , eprint=

2026

[44] [44]

Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026) , address=

SemEval-2026 Task 8: MTRAGEval: Evaluating Multi-Turn RAG Conversations , author=. Proceedings of the 20th International Workshop on Semantic Evaluation (SemEval-2026) , address=. 2026 , organization=

2026

[45] [45]

Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten , title =. Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2025 , isbn =. doi:10.1145/3726302.3730023 , abstract =

work page doi:10.1145/3726302.3730023 2025