EET: Experience-Driven Early Termination for Cost-Efficient Software Engineering Agents

Jie M. Zhang; Mark Harman; Yang Liu; Yaoqi Guo; Yiling Lou; Ying Xiao; Zhenpeng Chen

arxiv: 2601.05777 · v2 · submitted 2026-01-09 · 💻 cs.SE

EET: Experience-Driven Early Termination for Cost-Efficient Software Engineering Agents

Yaoqi Guo , Ying Xiao , Jie M. Zhang , Mark Harman , Yiling Lou , Yang Liu , Zhenpeng Chen This is my paper

Pith reviewed 2026-05-16 16:11 UTC · model grok-4.3

classification 💻 cs.SE

keywords software engineering agentscost efficiencyearly terminationpatch generationSWE-benchlarge language modelsexperience reuseagent optimization

0 comments

The pith

Software engineering agents can cut costs by 32 percent on average by using past resolution experience to stop patch generation early.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EET, which pulls structured experience from completed issue resolutions and uses it to decide when an agent should stop generating more patches on a new problem. This targets the repeated unproductive iterations that drive up token and API costs in LLM-based software engineering agents. Evaluation on the SWE-bench Verified benchmark with three different agents shows consistent savings of 19 to 55 percent in total cost while losing at most 0.2 percent in resolution rate. The gains come from spotting early-termination chances on 11 percent of issues and cutting API calls by 21 percent along with substantial token reductions.

Core claim

EET shows that structured experience extracted from prior issue-resolution executions can reliably guide early termination during patch generation and selection, delivering 19-55 percent cost reductions with at most 0.2 percent loss in resolution rate across three representative agents on SWE-bench Verified.

What carries the argument

The experience-driven early termination policy that extracts and reapplies structured lessons from past executions to halt further patch iterations once success appears unlikely.

If this is right

Average reductions of 21 percent in API calls, 30 percent in input tokens, and 25 percent in output tokens.
Early termination opportunities identified for 11 percent of issues on average.
Cost savings hold across multiple distinct SE agent implementations.
Task success rate remains essentially unchanged while total monetary cost drops substantially.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same experience extraction approach could be tested on agent tasks outside issue fixing, such as code review or test generation.
Continuously updating the experience store with newly resolved issues might increase savings over time.
Experiences collected from one agent could transfer to other agents or models without retraining.

Load-bearing premise

Structured experience from earlier issues generalizes safely to new issues and can trigger early termination without missing viable patches.

What would settle it

A fresh benchmark run where applying EET causes the resolution rate to fall more than 0.2 percent below the baseline agent rate.

read the original abstract

Software engineering (SE) agents powered by large language models are increasingly adopted in practice, yet they often incur substantial monetary cost. We introduce EET, an experience-driven early termination approach that reduces the cost of SE agents while preserving task performance. EET extracts structured experience from prior issue-resolution executions and leverages it to guide early termination during patch generation and selection, reducing unproductive iterations. We evaluate EET on the SWE-bench Verified benchmark across three representative SE agents. EET consistently reduces total cost by 19%-55% (32% on average), with negligible loss in resolution rate (at most 0.2%). These efficiency gains are achieved, on average, by identifying early-termination opportunities for 11% of issues and reducing API calls, input tokens, and output tokens by 21%, 30%, and 25%, respectively. We release the code, prompts, and data at https://github.com/IanWalls/EET.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces EET, an experience-driven early termination method for LLM-powered software engineering agents. It extracts structured experience from prior issue-resolution executions to trigger early stopping during patch generation and selection, thereby reducing unproductive iterations and monetary cost. On the SWE-bench Verified benchmark across three representative agents, EET reports consistent total-cost reductions of 19-55% (32% average) while incurring at most a 0.2% drop in resolution rate; these gains arise from early termination on 11% of issues and corresponding reductions in API calls (21%), input tokens (30%), and output tokens (25%). The authors release code, prompts, and data.

Significance. If the experience store is constructed from issues strictly disjoint from the evaluation set and the termination policy generalizes, the result would be practically significant: it demonstrates a lightweight, experience-based mechanism that can materially lower the deployment cost of SE agents without materially harming task success. The empirical scale (three agents, standard benchmark) and public release of artifacts strengthen the contribution if the data-provenance concern is resolved.

major comments (2)

[Abstract] Abstract: the claim that structured experience 'generalizes to new issues' and safely triggers termination rests on an unstated assumption that the prior executions are drawn from issues disjoint from the 500 SWE-bench Verified instances. No information is given on the provenance of the experience data, the matching procedure, or overlap statistics; if any overlap exists, the reported 19-55% cost savings and ≤0.2% resolution loss could be artifacts of in-distribution early stopping rather than out-of-distribution generalization.
[Abstract] Abstract and Evaluation sections: the paper provides no description of how experience is represented (e.g., what fields are stored, how similarity is computed), what concrete termination thresholds are used, or what controls are applied to guard against selection bias among the 11% of issues terminated early. These omissions make it impossible to reproduce the exact cost-reduction figures or to assess whether the negligible resolution-rate loss is robust.

minor comments (1)

The abstract states that 'we release the code, prompts, and data'; the repository should explicitly include the exact experience-extraction scripts, the list of issues used to populate the experience store, and the train/test split used for evaluation so that reviewers can verify disjointness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for highlighting important points regarding the clarity of our claims and the reproducibility of our method. We will make revisions to address both major comments.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that structured experience 'generalizes to new issues' and safely triggers termination rests on an unstated assumption that the prior executions are drawn from issues disjoint from the 500 SWE-bench Verified instances. No information is given on the provenance of the experience data, the matching procedure, or overlap statistics; if any overlap exists, the reported 19-55% cost savings and ≤0.2% resolution loss could be artifacts of in-distribution early stopping rather than out-of-distribution generalization.

Authors: We agree with the referee that the abstract should make the assumption explicit. In the revised version, we will update the abstract to state that the experience data is constructed from prior executions on issues disjoint from the SWE-bench Verified instances. We will also add a description of the provenance, matching procedure, and overlap statistics in the Evaluation section to substantiate the generalization claim. revision: yes
Referee: [Abstract] Abstract and Evaluation sections: the paper provides no description of how experience is represented (e.g., what fields are stored, how similarity is computed), what concrete termination thresholds are used, or what controls are applied to guard against selection bias among the 11% of issues terminated early. These omissions make it impossible to reproduce the exact cost-reduction figures or to assess whether the negligible resolution-rate loss is robust.

Authors: We acknowledge these omissions in the current manuscript. We will expand the abstract and Evaluation sections to describe how experience is represented, the similarity computation method, the specific termination thresholds used, and the controls implemented to avoid selection bias. These additions will enable reproduction of the cost-reduction figures and assessment of the robustness of the resolution rate. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical cost reductions measured on external benchmark

full rationale

The paper describes an empirical method that extracts structured experience from prior issue-resolution executions and applies it to trigger early termination in SE agents. Evaluation is performed on the SWE-bench Verified benchmark, reporting measured reductions in API calls, tokens, and total cost (19-55%). No equations, fitted parameters, or derivations are present that reduce the claimed savings to quantities defined by the experience data itself. The generalization claim is tested via direct measurement rather than by construction or self-referential definition, making the derivation chain self-contained against the external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that past execution traces contain transferable signals for termination decisions.

pith-pipeline@v0.9.0 · 5476 in / 923 out tokens · 87127 ms · 2026-05-16T16:11:26.995951+00:00 · methodology

EET: Experience-Driven Early Termination for Cost-Efficient Software Engineering Agents

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)