RePo: Language Models with Context Re-Positioning

Deng Cai; Huayang Li; Richard Sproat; Tianyu Zhao

arxiv: 2512.14391 · v3 · pith:RNQCDEQAnew · submitted 2025-12-16 · 💻 cs.LG · cs.AI· cs.CL

RePo: Language Models with Context Re-Positioning

Huayang Li , Tianyu Zhao , Deng Cai , Richard Sproat This is my paper

classification 💻 cs.LG cs.AIcs.CL

keywords repoattentioncontextinformationmodelsstructureburdencontextual

0 comments

read the original abstract

In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant positional indices. The rigid position information poses the full burden of organizing the input structure to attention layers, thus reducing the amount of attention that could be allocated for more critical information. To address this, we propose RePo, a novel mechanism that alleviates the burden for attention layers via context re-positioning. Unlike conventional approaches, RePo utilizes a differentiable module, $f_\phi$, to assign token positions that capture contextual dependencies, rather than replying on pre-defined order. By continually pre-training on the OLMo-2 1B \& 7B models, we demonstrate that RePo consistently enhances performance on tasks involving noisy contexts, structured data, and longer context length, while maintaining competitive performance on general short-context tasks. Analysis reveals that RePo successfully allocates more attention mass to distant but relevant information, assigns positions in a dense and non-linear space, and captures the intrinsic structure of the input context. Our code is at https://github.com/SakanaAI/repo.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generative 3D Gaussians with Learned Density Control
cs.GR 2026-05 unverdicted novelty 6.0

DeG models 3D Gaussians via learned octree density and uses VecSeq Sobol re-indexing to turn set generation into sequence modeling, claiming SOTA quality in single-image-to-3D.