Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

Mark M\"uller; Martin Vechev; Niels M\"undler; Thibaud Gloaguen; Veselin Raychev

arxiv: 2602.11988 · v2 · pith:UUPF6ZJInew · submitted 2026-02-12 · 💻 cs.SE · cs.AI

Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

Thibaud Gloaguen , Niels M\"undler , Mark M\"uller , Veselin Raychev , Martin Vechev This is my paper

classification 💻 cs.SE cs.AI

keywords contextfilesagentscodingrepositorieswhilealthoughdeveloper-committed

0 comments

read the original abstract

A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files, and a novel collection of issues from repositories containing developer-committed context files. Surprisingly, we find that providing context files does not generally improve task success rates, while increasing inference cost by over 20% on average. This observation holds across different LLMs, coding agents, and for both LLM-generated and developer-committed context files. Specifically, we find that while instructions in the context files are well followed by coding agents, repository overviews, although popular and recommended by model providers, are not helpful. We conclude that while context files are useful for specifying non-standard coding practices, any attempts to improve performance should be rigorously evaluated before deployment.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 11 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Knowledge-Based Pull Requests: A Trusted Workflow for Agent-Mediated Knowledge Collaboration
cs.SE 2026-06 unverdicted novelty 7.0

Introduces Knowledge-Based Pull Requests as a workflow that separates knowledge acceptance from code merge using agent distillation and project-side regeneration.
BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge
cs.SE 2026-05 unverdicted novelty 7.0

BootstrapAgent distills repository bootstrapping heuristics into a persistent .bootstrap contract via multi-agent evidence extraction, Docker verification, and trace-driven repair, reporting 92.9% success and efficien...
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization
cs.SE 2026-05 unverdicted novelty 7.0

PerfCodeBench reveals that state-of-the-art LLMs produce functionally correct but significantly slower code than expert-optimized versions on system-level tasks, especially those involving parallelism and GPUs.
Rule Taxonomy and Evolution in AI IDEs: A Mining and Survey Study
cs.SE 2026-06 unverdicted novelty 6.0

Mixed-methods study creates taxonomy of AI IDE rules from 7310 instances, analyzes evolution drivers, and reports that rule updates raise average artifact compliance from 49.14% to 72.13%.
Coding Agents Don't Know When to Act
cs.SE 2026-05 unverdicted novelty 6.0

Coding agents exhibit action bias by proposing undesirable changes on already-fixed issues 35-65% of the time, and explicit reproduction instructions only partially mitigate this while creating new abstention errors.
ZORO: Active Rules for Reliable Vibe Coding
cs.HC 2026-04 unverdicted novelty 6.0

ZORO integrates rules directly into AI coding workflows by enriching plans, enforcing compliance with proof requirements, and evolving rules via user feedback, resulting in better rule adherence and shifts in user behavior.
Instruction Adherence in Coding Agent Configuration Files: A Factorial Study of Four File-Structure Variables
cs.SE 2026-05 unverdicted novelty 5.0

A 1650-session factorial study found no measurable impact from config file size, instruction position, architecture, or conflicts on coding agent adherence, though compliance declined within sessions.
From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution
cs.SE 2026-04 unverdicted novelty 5.0

Compact Gene representations of experience outperform documentation-oriented Skill packages for test-time control and iterative evolution in code-solving tasks, with measured gains on CritPt from 9.1% to 18.57% and 17...
Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development
cs.SE 2026-05 unverdicted novelty 4.0

Agentic Agile-V uses Agile-V as backbone and a Specify-Constrain-Orchestrate-Prove-Evolve-Verify loop to convert AI agent conversations into traceable engineering artifacts with acceptance evidence.
Bridging the Gap on AI-Assisted Scientific Software Development Through Transparency and Traceability
cs.SE 2026-05 conditional novelty 4.0

Proposes guidance for responsible AI use in scientific software development under NQA-1 standards, illustrated with TMAP8 V&V cases to ensure accountability and auditability.
Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development
cs.SE 2026-04 unverdicted novelty 4.0

The authors propose field-scoped epistemic grounding documents that override user prompts with non-negotiable validity rules for AI-assisted scientific software development.