pith. sign in

arxiv: 2509.22854 · v2 · pith:4SFF2AUYnew · submitted 2025-09-26 · 💻 cs.CL

Train Once, Reuse Everywhere: Generalizable Implicit In-Context Learning by Routing Attention

classification 💻 cs.CL
keywords implicitattentionin-contextexistinggeneralizablelearningllmslogits
0
0 comments X
read the original abstract

Implicit in-context learning (ICL) has newly emerged as a promising paradigm that simulates ICL behaviors in the representation space of large language models (LLMs), aiming to attain few-shot performance at zero-shot cost. However, existing approaches largely rely on injecting shift vectors into residual flows, which are typically constructed from labeled demonstrations or task-specific alignment. Such designs fall short of utilizing the structural mechanisms underlying ICL and suffer from limited generalizability. To address this, we propose In-Context Routing (ICR), a novel implicit ICL method that captures and utilizes generalizable ICL patterns at the attention logits level. It extracts reusable structural directions that emerge during ICL and employs a learnable input-conditioned router to modulate attention logits accordingly, enabling an efficient train-once-and-reuse framework. We evaluate ICR on 12 real-world datasets spanning diverse domains and multiple LLMs. The results show that ICR consistently outperforms existing implicit ICL methods that require task-specific retrieval or training, while demonstrating robust generalization to out-of-domain tasks where they struggle. These findings position ICR to push the boundary of the practical value of ICL. The code is available at https://github.com/Lijiaqian1/In-Context-Routing.git.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Distributional Alignment as a Criterion for Designing Task Vectors in In-Context Learning

    cs.CL 2026-05 unverdicted novelty 6.0

    A distributional alignment metric d_NTP and a linear regression method LTV for task vectors that improves accuracy by 9.2% over baselines on classification and regression tasks across multiple LLMs.