Learning without training: The implicit dynamics of in-context learning

Benoit Dherin; Hanna Mazzawi; Javier Gonzalvo; Michael Munn; Michael Wunder

Learning without training: The implicit dynamics of in-context learning

Not yet reviewed by Pith; the record is open.

Re-run · record.json Download PDF Read on arXiv ↗

This paper has not been read by Pith yet. Machine review is queued; the pith claim, tier, and objections will appear here once it completes.

SPECIMEN: schema-true, not a live event

T0 review · schema-true

One-sentence machine reading of the paper's core claim.

pith:XXXXXXXX · record.json · timestamp

arxiv 2507.16003 v4 pith:JPMCMOSS submitted 2025-07-21 cs.CL cs.LG

Learning without training: The implicit dynamics of in-context learning

Benoit Dherin , Michael Munn , Hanna Mazzawi , Michael Wunder , Javier Gonzalvo This is my paper

classification cs.CL cs.LG

keywords contextin-contextlearningpatternstrainingwithoutduringforward

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

0 comments

read the original abstract

One of the most striking features of Large Language Models (LLMs) is their ability to learn in-context. Namely at inference time an LLM is able to learn new patterns without any additional weight update when these patterns are presented in the form of examples in the prompt, even if these patterns were not seen during training. The mechanisms through which this can happen are still largely unknown. In this work, we show that the stacking of a self-attention layer with an MLP allows the transformer block to implicitly modify the weights of the MLP layer according to the context. We argue through theoretical analysis and experimentation that this simple mechanism may help explain why LLMs demonstrate capabilities of in-context learning, beyond what is captured during training. Specifically, we show that a standard forward pass with context is mathematically equivalent to a forward pass without context but with the MLP weights updated by a minimal low-rank update representing the context.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Cost-Aware Optimization for Agentic Query Execution
cs.DB 2026-06 unverdicted novelty 7.0

EnumGRPO is a self-improving optimizer for agentic query execution that reduces LLM-operator costs by ~317x while improving accuracy by 18% over a hybrid baseline across four databases.
Causal Interventions on Continuous Variables: A Case Study on Verb Bias in Steering Vectors for In-Context Learning
cs.CL 2026-05 unverdicted novelty 7.0

A method for causal intervention on continuous variables shows verb bias is causally encoded in LLM steering vectors and affects syntactic preferences, though links to in-context learning error signals are not causal.
Mitigating Many-shot Jailbreak Attacks with One Single Demonstration
cs.CR 2026-05 conditional novelty 7.0

A single safety demonstration appended at inference time mitigates many-shot jailbreak attacks by counteracting implicit malicious fine-tuning on harmful examples.
Steer Like the LLM: Activation Steering that Mimics Prompting
cs.CL 2026-05 unverdicted novelty 7.0

PSR models that estimate token-specific steering coefficients from activations outperform standard activation steering and compare favorably to prompting on steering benchmarks.
Evaluating Temporal Consistency in Multi-Turn Language Models
cs.CL 2026-04 unverdicted novelty 7.0

Language models frequently violate temporal scope stability in multi-turn dialogues by drifting toward present-day assumptions even when they possess the correct facts.
Prompting Complexity: Shortest Prompts for Texts and Behaviors in LLMs
cs.CL 2026-07 conditional novelty 6.0

The paper defines prompting complexity as the length of the shortest plausible prompt that deterministically generates a target text with a fixed language model.
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
cs.LG 2026-06 unverdicted novelty 6.0

Language models can use a two-stage sleep process of upward distillation for memory consolidation and RL-based dreaming for unsupervised self-improvement to enable continual learning.
Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories
cs.LG 2026-06 conditional novelty 6.0

Sleep-time Knowledge Seeding plus Dreaming lets LLMs expand capacity, distill fragile in-context memories into stable parameters, and self-improve without human labels.
Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
cs.LG 2026-05 unverdicted novelty 6.0

SHIFT selects compact RLVR training subsets using the magnitude of hidden-state change from a single inference rollout plus quality-weighted farthest-first coverage, outperforming training-free baselines on math reaso...
Fast Spatial Memory with Elastic Test-Time Training
cs.CV 2026-04 unverdicted novelty 6.0

Elastic Test-Time Training stabilizes test-time updates via an elastic prior and moving-average anchor, enabling Fast Spatial Memory for scalable long-sequence 4D reconstruction with reduced memory use and fewer shortcuts.
TTT3R: 3D Reconstruction as Test-Time Training
cs.CV 2025-09 unverdicted novelty 5.0

TTT3R derives a closed-form learning rate from memory-observation alignment confidence to boost length generalization in RNN-based 3D reconstruction by 2x in global pose estimation.