Understanding Fact Recall in Language Models: Why Two-Stage Training Encourages Memorization but Mixed Training Teaches Knowledge
read the original abstract
While fine-tuning is the standard for injecting factual knowledge into large language models (LLMs), the mechanisms enabling reliable fact recall via unseen queries remain poorly understood. Common two-stage training strategies, which sequentially train on fact storage and query formats, often cause rote memorization. In contrast, mixed training jointly optimizes both formats and exhibits superior generalized recall. We investigate this success by comparing the two paradigms across 2.8$\sim$4B LLMs and identify the core mechanism: the joint optimization objective in mixed training induces gradient consistency across storage and query formats. This in turn drives the representation consistency between the two formats, establishing a format-invariant retrieval process that maps unseen queries to stored facts. In contrast, the lack of such an objective in two-stage training results in inconsistent representations and failed recall. The consistency further localizes to the parameters updated by both formats, a set that is substantially larger under mixed training than under two-stage training. At the input level, the consistency leaves an interpretable signature: mixed training encodes facts in storage format from subject-relation tokens, the same components available in queries, while two-stage training relies on the full context. Our findings characterize the mechanisms of fact recall and offer mechanistic foundation for optimizing knowledge injection in LLMs.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Deep sequence models tend to memorize geometrically; it is unclear why
Deep sequence models develop geometric memory in embeddings that encodes novel global relationships, transforming l-fold composition tasks into 1-step navigation via a natural spectral bias connected to Node2Vec.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.