State Soup: In-Context Skill Learning, Retrieval and Mixing

Jo\~ao Sacramento; Johannes von Oswald; Maciej Pi\'oro; Maciej Wo{\l}czyk; Razvan Pascanu

arxiv: 2406.08423 · v1 · pith:PENMZTILnew · submitted 2024-06-12 · 💻 cs.LG · cs.AI

State Soup: In-Context Skill Learning, Retrieval and Mixing

Maciej Pi\'oro , Maciej Wo{\l}czyk , Razvan Pascanu , Johannes von Oswald , Jo\~ao Sacramento This is my paper

classification 💻 cs.LG cs.AI

keywords in-contextlearningmodelsequenceinterpolationmergingmodelsperformance

0 comments

read the original abstract

A new breed of gated-linear recurrent neural networks has reached state-of-the-art performance on a range of sequence modeling problems. Such models naturally handle long sequences efficiently, as the cost of processing a new input is independent of sequence length. Here, we explore another advantage of these stateful sequence models, inspired by the success of model merging through parameter interpolation. Building on parallels between fine-tuning and in-context learning, we investigate whether we can treat internal states as task vectors that can be stored, retrieved, and then linearly combined, exploiting the linearity of recurrence. We study this form of fast model merging on Mamba-2.8b, a pretrained recurrent model, and present preliminary evidence that simple linear state interpolation methods suffice to improve next-token perplexity as well as downstream in-context learning task performance.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

S0 Tuning: Zero-Overhead Adaptation of Hybrid Recurrent-Attention Models
cs.CL 2026-04 conditional novelty 7.0

S0 tuning optimizes initial recurrent states in hybrid models to outperform LoRA with zero inference cost on HumanEval and partial cross-domain transfer.