When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation

· 2026 · cs.SE · arXiv 2604.09515

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

The rapid evolution of software libraries creates a significant challenge for Large Language Models (LLMs), whose static parametric knowledge often becomes stale post-training. While retrieval-augmented generation (RAG) is commonly used to provide up-to-date API specifications, "context-memory conflict" arises when external instructions contradict a model's internal parametric knowledge. This paper presents a systematic empirical study of LLM code generation under API evolution (e.g., API deprecation, API modification, and API addition), by constructing a benchmark of 270 real-world updates from eight Python libraries. We evaluate four LLM families of 11 models. Our results show that without comprehensive documentation, LLMs struggle to prioritize external context, averaging only 42.55% of generated code examples are executable in the target environment. While structured documentation and larger model scales improve LLMs' ability to update adoption, they do not fully resolve executability issues with a low 66.36% executable rate. In addition, reasoning-based strategies (e.g., Self-Reflection) significantly boost LLMs' performance with 11% improvement on executable rate. Our findings highlight the persistence of outdated patterns from LLMs, even when API update specifications are provided, and emphasize the need for evolution-aware benchmarks and techniques.

representative citing papers

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

cs.SE · 2026-05-14 · accept · novelty 7.0

Stale repository context in code RAG actively induces models to produce obsolete helper references, raising stale outputs by 76-88 percentage points over current-only retrieval in a 17-sample diagnostic study.

citing papers explorer

Showing 1 of 1 citing paper.

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context cs.SE · 2026-05-14 · accept · none · ref 1 · internal anchor
Stale repository context in code RAG actively induces models to produce obsolete helper references, raising stale outputs by 76-88 percentage points over current-only retrieval in a 17-sample diagnostic study.

When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation

fields

years

verdicts

representative citing papers

citing papers explorer