Context Rot in AI-Assisted Software Development: Repurposing Documentation Consistency for AI Configuration Artifacts

Christoph Treude; Sebastian Baltes

arxiv: 2606.09090 · v1 · pith:YQ2S24NOnew · submitted 2026-06-08 · 💻 cs.SE · cs.AI

Context Rot in AI-Assisted Software Development: Repurposing Documentation Consistency for AI Configuration Artifacts

Christoph Treude , Sebastian Baltes This is my paper

Pith reviewed 2026-06-27 15:54 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords context rotAI configuration artifactsCLAUDE.mddocumentation consistencystale referencesREADME consistencyAI coding assistantssoftware documentation

0 comments

The pith

Existing documentation consistency checkers can already detect context rot in AI configuration files such as CLAUDE.md.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Developers supply AI coding assistants with persistent context through dedicated files that describe code elements, architecture, and conventions. These files can grow stale as the underlying software evolves, which the authors label context rot. The paper connects this issue to decades of research on documentation-code consistency and argues that the existing toolbox of checkers for READMEs, comments, and architecture descriptions offers an immediate way to surface the problem. As initial evidence, the authors ran one such checker on a representative sample of 356 repositories and found stale code element references in 23 percent of them. The work therefore positions traditional consistency detection methods as a practical starting point rather than requiring entirely new mechanisms.

Core claim

The paper claims that context rot in AI configuration artifacts is the same underlying consistency problem studied for traditional documentation, so existing detection approaches transfer directly and provide both immediate detection capability and a mapped research agenda for the new artifacts.

What carries the argument

Repurposing of an existing README/wiki consistency checker to scan AI configuration artifacts for stale code element references.

If this is right

Context rot can be identified in current AI configuration files using tools that already exist.
A research roadmap exists that maps each established documentation consistency technique to a corresponding problem in AI configuration artifacts.
Prevalence data from hundreds of repositories shows the issue occurs at measurable rates today.
Maintenance of AI context files can borrow directly from long-standing practices for keeping documentation in sync with code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Teams maintaining AI coding assistants could embed these checkers into their update pipelines to warn users when context files drift.
The same analogy might extend to other persistent AI memory mechanisms such as vector stores or prompt histories that also reference code elements.
Empirical studies could test whether fixing the 23 percent of detected cases measurably improves AI assistant output quality over time.

Load-bearing premise

The assumption that consistency problems between traditional documentation and code are sufficiently similar to those between AI configuration files and code that the same detection methods apply without major adaptation.

What would settle it

A controlled comparison in which the same set of AI configuration files is examined both by traditional consistency checkers and by human experts or new AI-specific rules, revealing that the traditional tools miss the majority of actual staleness cases.

read the original abstract

Developers increasingly provide AI coding assistants with persistent context through configuration files such as CLAUDE.md, AGENTS.md, and .cursorrules. These files describe code elements, architecture, and development conventions, forming the context that guides AI tool behavior across sessions. As software evolves, this context can become stale, a phenomenon we call context rot. While AI configuration artifacts are new, the underlying consistency problem connects to decades of software documentation research. Researchers have built tools to check consistency between documentation and code, spanning README files, code comments, API documentation, architecture descriptions, and installation instructions. We argue that this existing toolbox is an immediate starting point for detecting context rot, and we present a research roadmap mapping documentation consistency approaches to corresponding problems in this new setting. As preliminary evidence, applying an existing README/wiki consistency checker to a statistically representative sample of 356 repositories identifies stale code element references in 23.0% of repositories, showing that traditional documentation consistency tools can already surface context rot.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They name context rot for AI config files but the 23% stale rate only covers READMEs and wikis, not the new artifacts.

read the letter

The paper's main contribution is naming the staleness problem in persistent AI context files like CLAUDE.md as context rot and sketching a roadmap to adapt existing documentation consistency tools to it. The authors connect the issue to decades of prior work on READMEs, comments, and API docs, which is a reasonable move.

They report one concrete measurement: running an existing checker on a sample of 356 repositories turned up stale code-element references in 23% of them. This gives a sense that consistency problems occur at a noticeable rate in open-source projects.

The soft spot is that the measurement was done only on traditional documentation. No results are shown for AI configuration files themselves, so the claim that the old tools can already surface context rot rests on the assumption that the problems and detection methods transfer directly. That assumption is stated but not tested in the data presented.

The paper is a short conceptual piece aimed at researchers working on AI-assisted development tools and documentation analysis. Readers interested in early framing of maintenance issues for coding assistants will get something out of the roadmap and the literature pointers. It is not a full empirical study.

I would bring it to a reading group to talk through the proposed mappings. I would not cite it in my own work at this stage. It could reasonably go to peer review as a short note if the venue wants to surface emerging practical problems, though the evidence would need to address the target artifacts more directly.

Referee Report

2 major / 2 minor

Summary. The paper defines 'context rot' as staleness in persistent AI configuration files (e.g., CLAUDE.md) that supply context to coding assistants. It connects the problem to decades of documentation consistency research, argues that existing tools for READMEs, wikis, and similar artifacts provide an immediate starting point for detection, presents a research roadmap mapping prior approaches to the new setting, and supplies preliminary evidence by applying an existing checker to a sample of 356 repositories, finding stale code-element references in 23.0% of them.

Significance. If the proposed analogy between traditional documentation and AI configuration artifacts holds, the work could accelerate tool development for a timely problem in AI-assisted software engineering by directly leveraging mature consistency-checking techniques. The observational measurement supplies concrete, if limited, support for the prevalence of staleness issues in documentation.

major comments (2)

[Abstract] Abstract: the claim that the 23.0% stale-reference rate shows 'traditional documentation consistency tools can already surface context rot' is unsupported by the reported evidence, because the checker was applied exclusively to README/wiki files; no measurements on AI configuration artifacts are presented, yet context rot is defined specifically for the latter.
[Abstract] Abstract (preliminary evidence paragraph): the 23.0% figure is presented without error bars, repository-sampling methodology, inter-rater validation of stale references, or any test of heuristic transfer to AI config files, weakening its role as support for the repurposing claim.

minor comments (2)

[Abstract] Abstract: add a parenthetical citation or one-sentence description of the specific consistency checker used for the 356-repository study.
[Roadmap] Roadmap section: clarify whether any of the mapped techniques require modification before application to AI artifacts or are claimed to transfer unchanged.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive critique of the abstract. The comments correctly identify that the preliminary evidence is indirect and that the abstract's phrasing requires tightening. We will revise the abstract and related text to clarify the scope and limitations of the 23.0% measurement while preserving the core argument that existing documentation-consistency techniques supply a ready starting point. Below we respond point by point.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the 23.0% stale-reference rate shows 'traditional documentation consistency tools can already surface context rot' is unsupported by the reported evidence, because the checker was applied exclusively to README/wiki files; no measurements on AI configuration artifacts are presented, yet context rot is defined specifically for the latter.

Authors: We agree that the abstract sentence overstates the direct applicability. The 23.0% result demonstrates that an existing checker can detect stale code-element references in conventional documentation; it does not constitute a measurement performed on CLAUDE.md-style artifacts. The manuscript's central claim is that the analogy is close enough to justify repurposing the prior techniques, not that the same tool has already been shown to work on AI configuration files. We will revise the abstract to read that the measurement 'illustrates that mature consistency checkers already exist and can surface staleness in documentation that shares structural similarities with AI configuration artifacts,' and we will add an explicit caveat that direct evaluation on AI configuration files remains future work. revision: yes
Referee: [Abstract] Abstract (preliminary evidence paragraph): the 23.0% figure is presented without error bars, repository-sampling methodology, inter-rater validation of stale references, or any test of heuristic transfer to AI config files, weakening its role as support for the repurposing claim.

Authors: The manuscript states that the sample is 'statistically representative,' but we accept that the abstract supplies insufficient methodological detail. We will expand the abstract sentence to include a brief description of the sampling frame and will add a short methods paragraph (or footnote) that reports the repository-selection criteria and notes the absence of inter-rater validation and direct transfer testing. Because the evidence is labeled 'preliminary,' we will also insert a sentence acknowledging these limitations rather than claiming the figure alone proves transfer. Full replication details can be placed in an appendix if space allows. revision: partial

Circularity Check

0 steps flagged

No circularity; observational result on traditional docs is independent

full rationale

The paper's central step is an empirical measurement: an existing README/wiki consistency checker applied to 356 repositories yields a 23% stale code-element reference rate. This is a direct observation on traditional documentation and does not reduce by construction to any quantity defined inside the paper, any fitted parameter, or a self-citation chain. The subsequent claim that the same tools 'can already surface context rot' is an analogy argument resting on the untested transferability assumption, but that analogy is not presented as a derivation or prediction that equals its inputs. No equations, self-definitional loops, renamed known results, or load-bearing self-citations appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that AI configuration artifacts exhibit consistency problems analogous to traditional documentation, with the invented entity being the named phenomenon of context rot; no free parameters are introduced.

axioms (1)

domain assumption AI configuration artifacts such as CLAUDE.md exhibit consistency problems analogous to those in traditional software documentation
This underpins the argument that the existing toolbox of documentation consistency approaches is an immediate starting point for detecting context rot.

invented entities (1)

context rot no independent evidence
purpose: To name and frame the phenomenon of staleness in persistent AI context files as software evolves
New term introduced to organize the problem and motivate the roadmap; no independent evidence provided outside the paper.

pith-pipeline@v0.9.1-grok · 5703 in / 1503 out tokens · 30141 ms · 2026-06-27T15:54:21.271879+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 7 canonical work pages

[1]

Nour Ali, Sean Baker, Ross O’Crowley, Sebastian Herold, and Jim Buckley. 2018. Architecture Consistency: State of the Practice, Challenges and Requirements. Empirical Software Engineering23, 1 (2018), 224–258

2018
[2]

2026.A Dataset of Agentic AI Coding Tool Configurations

Sebastian Baltes, Seyedmoein Mohsenimofidi, Levi Böhme, Jai Lulla, Muham- mad Auwal Abubakar, Christoph Treude, and Matthias Galster. 2026.A Dataset of Agentic AI Coding Tool Configurations. doi:10.5281/zenodo.19375880

work page doi:10.5281/zenodo.19375880 2026
[3]

Nemania Borovits, Indika Kumara, Dario Di Nucci, Parvathy Krishnan, Ste- fano Dalla Palma, Fabio Palomba, Damian A Tamburri, and Willem-Jan van den Heuvel. 2022. FindICI: Using Machine Learning to Detect Linguistic Inconsisten- cies Between Code and Natural Language Descriptions in Infrastructure-as-Code. Empirical Software Engineering27, 7 (2022), 178

2022
[4]

Matthias Galster, Seyedmoein Mohsenimofidi, Levi Böhme, Jai Lal Lulla, Muham- mad Auwal Abubakar, Christoph Treude, and Sebastian Baltes. 2026. A Dataset of Agentic AI Coding Tool Configurations. InProceedings of the 3rd ACM Interna- tional Conference on AI-Powered Software(Montreal, Canada)(AIware ’26). ACM, New York, NY, USA. To appear

2026
[5]

Matthias Galster, Seyedmoein Mohsenimofidi, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, and Sebastian Baltes. 2026. Configuring Agentic AI Coding Tools: An Exploratory Study. InProceedings of the 3rd ACM International Conference on AI-Powered Software(Montreal, Canada)(AIware ’26). ACM, New York, NY, USA. To appear

2026
[6]

Haoyu Gao, Christoph Treude, and Mansooreh Zahedi. 2025. Adapting Installa- tion Instructions in Rapidly Evolving Software Ecosystems.IEEE Transactions on Software Engineering51, 4 (2025), 1334–1357. doi:10.1109/TSE.2025.3552614

work page doi:10.1109/tse.2025.3552614 2025
[7]

2025.Context Rot: How Increasing Input Tokens Impacts LLM Performance

Kelly Hong, Anton Troynikov, and Jeff Huber. 2025.Context Rot: How Increasing Input Tokens Impacts LLM Performance. Technical Report. Chroma. Technical report. https://research.trychroma.com/context-rot

2025
[8]

Jan Keim, Sophie Corallo, Dominik Fuchß, and Anne Koziolek. 2023. De- tecting Inconsistencies in Software Architecture Documentation Using Trace- ability Link Recovery. In2023 IEEE 20th International Conference on Soft- ware Architecture (ICSA)(L’Aquila, Italy). IEEE, Piscataway, NJ, USA, 141–152. doi:10.1109/ICSA56044.2023.00021

work page doi:10.1109/icsa56044.2023.00021 2023
[9]

Zhang, Sebas- tian Baltes, and Christoph Treude

Jai Lal Lulla, Seyedmoein Mohsenimofidi, Matthias Galster, Jie M. Zhang, Sebas- tian Baltes, and Christoph Treude. 2026. On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents. InProceedings of the Journal Ahead Workshop (Rio de Janeiro, Brazil)(JA Ws ’26). ACM, New York, NY, USA. To appear

2026
[10]

Seyedmoein Mohsenimofidi, Matthias Galster, Christoph Treude, and Sebastian Baltes. 2026. Context Engineering for AI Agents in Open-Source Software. In Proceedings of the 23rd International Conference on Mining Software Repositories (Rio de Janeiro, Brazil)(MSR ’26). ACM, New York, NY, USA. To appear

2026
[11]

Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, and Raymond J Mooney
[12]

InProceedings of the AAAI Conference on Artificial Intelligence, Vol

Deep Just-In-Time Inconsistency Detection Between Comments and Source Code. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. AAAI Press, Palo Alto, California, USA, 427–435. Issue 1. doi:10.1609/aaai.v35i1. 16119

work page doi:10.1609/aaai.v35i1
[13]

Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. iComment: Bugs or Bad Comments?. InProceedings of the Twenty-First ACM SIGOPS Symposium on Operating Systems Principles. ACM, New York, NY, USA, 145–158

2007
[14]

Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T Leavens. 2012. @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. In2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (Montreal, QC, Canada). IEEE, Piscataway, NJ, 260–269. doi:10.1109/ICST.2012. 106

work page doi:10.1109/icst.2012 2012
[15]

Wen Siang Tan, Markus Wagner, and Christoph Treude. 2023. Wait, wasn’t that code here before? Detecting Outdated Software Documentation. InProceedings of the International Conference on Software Maintenance and Evolution(Bogotá, Colombia). IEEE, Los Alamitos, CA, 553–557. doi:10.1109/ICSME58846.2023. 00071

work page doi:10.1109/icsme58846.2023 2023
[16]

Wen Siang Tan, Markus Wagner, and Christoph Treude. 2024. Detecting Outdated Code Element References in Software Repository Documentation.Empirical Software Engineering29, 1 (2024), 5

2024
[17]

2026.Online Appendix: Context Rot in AI-Assisted Software Development

Christoph Treude and Sebastian Baltes. 2026.Online Appendix: Context Rot in AI-Assisted Software Development. doi:10.5281/zenodo.20588740

work page doi:10.5281/zenodo.20588740 2026
[18]

Gias Uddin and Martin P Robillard. 2015. How API Documentation Fails.IEEE Software32, 4 (2015), 68–75

2015
[19]

Yu Zhou, Ruihang Gu, Taolue Chen, Zhiqiu Huang, Sebastiano Panichella, and Harald Gall. 2017. Analyzing APIs Documentation and Code to Detect Directive Defects. In2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)(Buenos Aires, Argentina). IEEE, Piscataway, NJ, USA, 27–37. doi:10.1109/ ICSE.2017.11

2017

[1] [1]

Nour Ali, Sean Baker, Ross O’Crowley, Sebastian Herold, and Jim Buckley. 2018. Architecture Consistency: State of the Practice, Challenges and Requirements. Empirical Software Engineering23, 1 (2018), 224–258

2018

[2] [2]

2026.A Dataset of Agentic AI Coding Tool Configurations

Sebastian Baltes, Seyedmoein Mohsenimofidi, Levi Böhme, Jai Lulla, Muham- mad Auwal Abubakar, Christoph Treude, and Matthias Galster. 2026.A Dataset of Agentic AI Coding Tool Configurations. doi:10.5281/zenodo.19375880

work page doi:10.5281/zenodo.19375880 2026

[3] [3]

Nemania Borovits, Indika Kumara, Dario Di Nucci, Parvathy Krishnan, Ste- fano Dalla Palma, Fabio Palomba, Damian A Tamburri, and Willem-Jan van den Heuvel. 2022. FindICI: Using Machine Learning to Detect Linguistic Inconsisten- cies Between Code and Natural Language Descriptions in Infrastructure-as-Code. Empirical Software Engineering27, 7 (2022), 178

2022

[4] [4]

Matthias Galster, Seyedmoein Mohsenimofidi, Levi Böhme, Jai Lal Lulla, Muham- mad Auwal Abubakar, Christoph Treude, and Sebastian Baltes. 2026. A Dataset of Agentic AI Coding Tool Configurations. InProceedings of the 3rd ACM Interna- tional Conference on AI-Powered Software(Montreal, Canada)(AIware ’26). ACM, New York, NY, USA. To appear

2026

[5] [5]

Matthias Galster, Seyedmoein Mohsenimofidi, Jai Lal Lulla, Muhammad Auwal Abubakar, Christoph Treude, and Sebastian Baltes. 2026. Configuring Agentic AI Coding Tools: An Exploratory Study. InProceedings of the 3rd ACM International Conference on AI-Powered Software(Montreal, Canada)(AIware ’26). ACM, New York, NY, USA. To appear

2026

[6] [6]

Haoyu Gao, Christoph Treude, and Mansooreh Zahedi. 2025. Adapting Installa- tion Instructions in Rapidly Evolving Software Ecosystems.IEEE Transactions on Software Engineering51, 4 (2025), 1334–1357. doi:10.1109/TSE.2025.3552614

work page doi:10.1109/tse.2025.3552614 2025

[7] [7]

2025.Context Rot: How Increasing Input Tokens Impacts LLM Performance

Kelly Hong, Anton Troynikov, and Jeff Huber. 2025.Context Rot: How Increasing Input Tokens Impacts LLM Performance. Technical Report. Chroma. Technical report. https://research.trychroma.com/context-rot

2025

[8] [8]

Jan Keim, Sophie Corallo, Dominik Fuchß, and Anne Koziolek. 2023. De- tecting Inconsistencies in Software Architecture Documentation Using Trace- ability Link Recovery. In2023 IEEE 20th International Conference on Soft- ware Architecture (ICSA)(L’Aquila, Italy). IEEE, Piscataway, NJ, USA, 141–152. doi:10.1109/ICSA56044.2023.00021

work page doi:10.1109/icsa56044.2023.00021 2023

[9] [9]

Zhang, Sebas- tian Baltes, and Christoph Treude

Jai Lal Lulla, Seyedmoein Mohsenimofidi, Matthias Galster, Jie M. Zhang, Sebas- tian Baltes, and Christoph Treude. 2026. On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents. InProceedings of the Journal Ahead Workshop (Rio de Janeiro, Brazil)(JA Ws ’26). ACM, New York, NY, USA. To appear

2026

[10] [10]

Seyedmoein Mohsenimofidi, Matthias Galster, Christoph Treude, and Sebastian Baltes. 2026. Context Engineering for AI Agents in Open-Source Software. In Proceedings of the 23rd International Conference on Mining Software Repositories (Rio de Janeiro, Brazil)(MSR ’26). ACM, New York, NY, USA. To appear

2026

[11] [11]

Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, and Raymond J Mooney

[12] [12]

InProceedings of the AAAI Conference on Artificial Intelligence, Vol

Deep Just-In-Time Inconsistency Detection Between Comments and Source Code. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. AAAI Press, Palo Alto, California, USA, 427–435. Issue 1. doi:10.1609/aaai.v35i1. 16119

work page doi:10.1609/aaai.v35i1

[13] [13]

Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. iComment: Bugs or Bad Comments?. InProceedings of the Twenty-First ACM SIGOPS Symposium on Operating Systems Principles. ACM, New York, NY, USA, 145–158

2007

[14] [14]

Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T Leavens. 2012. @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. In2012 IEEE Fifth International Conference on Software Testing, Verification and Validation (Montreal, QC, Canada). IEEE, Piscataway, NJ, 260–269. doi:10.1109/ICST.2012. 106

work page doi:10.1109/icst.2012 2012

[15] [15]

Wen Siang Tan, Markus Wagner, and Christoph Treude. 2023. Wait, wasn’t that code here before? Detecting Outdated Software Documentation. InProceedings of the International Conference on Software Maintenance and Evolution(Bogotá, Colombia). IEEE, Los Alamitos, CA, 553–557. doi:10.1109/ICSME58846.2023. 00071

work page doi:10.1109/icsme58846.2023 2023

[16] [16]

Wen Siang Tan, Markus Wagner, and Christoph Treude. 2024. Detecting Outdated Code Element References in Software Repository Documentation.Empirical Software Engineering29, 1 (2024), 5

2024

[17] [17]

2026.Online Appendix: Context Rot in AI-Assisted Software Development

Christoph Treude and Sebastian Baltes. 2026.Online Appendix: Context Rot in AI-Assisted Software Development. doi:10.5281/zenodo.20588740

work page doi:10.5281/zenodo.20588740 2026

[18] [18]

Gias Uddin and Martin P Robillard. 2015. How API Documentation Fails.IEEE Software32, 4 (2015), 68–75

2015

[19] [19]

Yu Zhou, Ruihang Gu, Taolue Chen, Zhiqiu Huang, Sebastiano Panichella, and Harald Gall. 2017. Analyzing APIs Documentation and Code to Detect Directive Defects. In2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE)(Buenos Aires, Argentina). IEEE, Piscataway, NJ, USA, 27–37. doi:10.1109/ ICSE.2017.11

2017