pith. sign in

arxiv: 2507.16003 · v4 · pith:JPMCMOSSnew · submitted 2025-07-21 · 💻 cs.CL · cs.LG

Learning without training: The implicit dynamics of in-context learning

classification 💻 cs.CL cs.LG
keywords contextin-contextlearningpatternstrainingwithoutduringforward
0
0 comments X
read the original abstract

One of the most striking features of Large Language Models (LLMs) is their ability to learn in-context. Namely at inference time an LLM is able to learn new patterns without any additional weight update when these patterns are presented in the form of examples in the prompt, even if these patterns were not seen during training. The mechanisms through which this can happen are still largely unknown. In this work, we show that the stacking of a self-attention layer with an MLP allows the transformer block to implicitly modify the weights of the MLP layer according to the context. We argue through theoretical analysis and experimentation that this simple mechanism may help explain why LLMs demonstrate capabilities of in-context learning, beyond what is captured during training. Specifically, we show that a standard forward pass with context is mathematically equivalent to a forward pass without context but with the MLP weights updated by a minimal low-rank update representing the context.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Mitigating Many-shot Jailbreak Attacks with One Single Demonstration

    cs.CR 2026-05 conditional novelty 7.0

    A single safety demonstration appended at inference time mitigates many-shot jailbreak attacks by counteracting implicit malicious fine-tuning on harmful examples.

  2. Steer Like the LLM: Activation Steering that Mimics Prompting

    cs.CL 2026-05 unverdicted novelty 7.0

    PSR models that estimate token-specific steering coefficients from activations outperform standard activation steering and compare favorably to prompting on steering benchmarks.

  3. Evaluating Temporal Consistency in Multi-Turn Language Models

    cs.CL 2026-04 unverdicted novelty 7.0

    Language models frequently violate temporal scope stability in multi-turn dialogues by drifting toward present-day assumptions even when they possess the correct facts.

  4. Fast Spatial Memory with Elastic Test-Time Training

    cs.CV 2026-04 unverdicted novelty 6.0

    Elastic Test-Time Training stabilizes test-time updates via an elastic prior and moving-average anchor, enabling Fast Spatial Memory for scalable long-sequence 4D reconstruction with reduced memory use and fewer shortcuts.

  5. TTT3R: 3D Reconstruction as Test-Time Training

    cs.CV 2025-09 unverdicted novelty 5.0

    TTT3R derives a closed-form learning rate from memory-observation alignment confidence to boost length generalization in RNN-based 3D reconstruction by 2x in global pose estimation.