pith. sign in

arxiv: 2601.11443 · v2 · pith:BAMMJHOFnew · submitted 2026-01-16 · 💻 cs.CL

Predict the Retrieval! Test time adaptation for Retrieval Augmented Generation

classification 💻 cs.CL
keywords domainsperformancespecializedttaragadaptationapproachgenerationlanguage
0
0 comments X
read the original abstract

Retrieval-Augmented Generation (RAG) has emerged as a powerful approach for enhancing large language models' question-answering capabilities through the integration of external knowledge. However, when adapting RAG systems to specialized domains, challenges arise from distribution shifts, resulting in suboptimal generalization performance. In this work, we propose TTARAG, a test-time adaptation method that dynamically updates the language model's parameters during inference to improve RAG system performance in specialized domains. Our method introduces a simple yet effective approach where the model learns to predict retrieved content, enabling automatic parameter adjustment to the target domain. Through extensive experiments across six specialized domains, we demonstrate that TTARAG achieves substantial performance improvements over baseline RAG systems. Code available at https://github.com/sunxin000/TTARAG.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GAPD: Gold-Action Policy Distillation for Agentic Reinforcement Learning in Knowledge Base Question Answering

    cs.CL 2026-05 unverdicted novelty 6.0

    GAPD adds dense token-level guidance from gold actions to outcome-based RL for KBQA via mid-anchor matching and outperforms SOTA on WebQSP, GrailQA, and GraphQ.

  2. EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering

    cs.CL 2026-06 unverdicted novelty 5.0

    EASE-TTT creates a soft attention target from evidence chunks to guide query-side test-time adaptation, yielding higher macro-average scores than full-context, retrieval-only, and standard qTTT baselines on six LongBe...