Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?
read the original abstract
A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files, and a novel collection of issues from repositories containing developer-committed context files. Surprisingly, we find that providing context files does not generally improve task success rates, while increasing inference cost by over 20% on average. This observation holds across different LLMs, coding agents, and for both LLM-generated and developer-committed context files. Specifically, we find that while instructions in the context files are well followed by coding agents, repository overviews, although popular and recommended by model providers, are not helpful. We conclude that while context files are useful for specifying non-standard coding practices, any attempts to improve performance should be rigorously evaluated before deployment.
This paper has not been read by Pith yet.
Forward citations
Cited by 11 Pith papers
-
Knowledge-Based Pull Requests: A Trusted Workflow for Agent-Mediated Knowledge Collaboration
Introduces Knowledge-Based Pull Requests as a workflow that separates knowledge acceptance from code merge using agent distillation and project-side regeneration.
-
BootstrapAgent: Distilling Repository Setup into Reusable Agent Knowledge
BootstrapAgent distills repository bootstrapping heuristics into a persistent .bootstrap contract via multi-agent evidence extraction, Docker verification, and trace-driven repair, reporting 92.9% success and efficien...
-
PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization
PerfCodeBench reveals that state-of-the-art LLMs produce functionally correct but significantly slower code than expert-optimized versions on system-level tasks, especially those involving parallelism and GPUs.
-
Rule Taxonomy and Evolution in AI IDEs: A Mining and Survey Study
Mixed-methods study creates taxonomy of AI IDE rules from 7310 instances, analyzes evolution drivers, and reports that rule updates raise average artifact compliance from 49.14% to 72.13%.
-
Coding Agents Don't Know When to Act
Coding agents exhibit action bias by proposing undesirable changes on already-fixed issues 35-65% of the time, and explicit reproduction instructions only partially mitigate this while creating new abstention errors.
-
ZORO: Active Rules for Reliable Vibe Coding
ZORO integrates rules directly into AI coding workflows by enriching plans, enforcing compliance with proof requirements, and evolving rules via user feedback, resulting in better rule adherence and shifts in user behavior.
-
Instruction Adherence in Coding Agent Configuration Files: A Factorial Study of Four File-Structure Variables
A 1650-session factorial study found no measurable impact from config file size, instruction position, architecture, or conflicts on coding agent adherence, though compliance declined within sessions.
-
From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution
Compact Gene representations of experience outperform documentation-oriented Skill packages for test-time control and iterative evolution in code-solving tasks, with measured gains on CritPt from 9.1% to 18.57% and 17...
-
Agentic Agile-V: From Vibe Coding to Verified Engineering in Software and Hardware Development
Agentic Agile-V uses Agile-V as backbone and a Specify-Constrain-Orchestrate-Prove-Evolve-Verify loop to convert AI agent conversations into traceable engineering artifacts with acceptance evidence.
-
Bridging the Gap on AI-Assisted Scientific Software Development Through Transparency and Traceability
Proposes guidance for responsible AI use in scientific software development under NQA-1 standards, illustrated with TMAP8 V&V cases to ensure accountability and auditability.
-
Agentic AI-assisted coding offers a unique opportunity to instill epistemic grounding during software development
The authors propose field-scoped epistemic grounding documents that override user prompts with non-negotiable validity rules for AI-assisted scientific software development.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.