ContextSniper: AntTrail's Token-Efficient Code Memory for Repository-Level Program Repair

Chiwang Luk; Gao Cong; Jinwei Zhu; Lei Chen; Matin Mohammad Najafi; Wei Yang; Xiuchang Li; Yang Ren; Zhifeng Jia

arxiv: 2607.01916 · v1 · pith:W6Z7IQTNnew · submitted 2026-07-02 · 💻 cs.AI

ContextSniper: AntTrail's Token-Efficient Code Memory for Repository-Level Program Repair

Chiwang Luk , Matin Mohammad Najafi , Zhifeng Jia , Wei Yang , Xiuchang Li , Jinwei Zhu , Yang Ren , Lei Chen

show 1 more author

Gao Cong

This is my paper

Pith reviewed 2026-07-03 14:06 UTC · model grok-4.3

classification 💻 cs.AI

keywords ContextSnipertoken-efficient memoryprogram repairLLM agentsSWE-bench Litecode memoryrepository-level repaircontext selection

0 comments

The pith

ContextSniper equips repository repair agents with a precision evidence selector that cuts token use by up to 51.5 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ContextSniper as a token-efficient code memory layer for repository-level program repair by large language model agents. It implements the Sniper feature that retrieves candidate code and runtime evidence, ranks it with hybrid signals, filters outputs through an intention-aware context gate, and returns compact evidence packets while keeping source context recoverable outside the prompt. Evaluation on SWE-bench Lite shows token reductions of 51.5 percent for one agent and 38.9 percent for another, with corresponding cost drops and only small declines in resolution rates. A reader would care because high context consumption limits how far these agents can scale on real codebases without excessive expense.

Core claim

ContextSniper implements the Sniper feature for precision evidence selection: it retrieves candidate code and runtime evidence, ranks it with hybrid retrieval signals, filters long outputs through an intention-aware context gate, and returns compact evidence packets while preserving recoverable source context outside the prompt. On SWE-bench Lite with 50 task runs per condition, this yields total token reductions of 51.5 percent and logged cost reductions of 36.4 percent for OpenClaw, plus 38.9 percent token and 27.3 percent estimated cost reductions for Claude Code, while submitted-resolution rates decrease only slightly from 26.0 percent to 24.0 percent and from 32.0 percent to 30.0 percen

What carries the argument

The Sniper feature, which retrieves candidate code and runtime evidence, ranks it with hybrid retrieval signals, filters long outputs through an intention-aware context gate, and returns compact evidence packets.

If this is right

Total token use drops by 51.5 percent for OpenClaw and 38.9 percent for Claude Code on SWE-bench Lite.
Logged or estimated costs decrease by 36.4 percent and 27.3 percent respectively under the same conditions.
Submitted-resolution rates fall only slightly, by 2 percentage points in each tested agent.
Recoverable source context remains available outside the prompt for any needed follow-up.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same selection pipeline could support extended agent sessions on larger repositories before context limits are reached.
Comparable filtering steps might reduce costs in other LLM agent workflows that scan full codebases, such as debugging or test generation.
Public release of the pilot testing scripts enables direct checks of the reported token and cost figures on additional tasks.

Load-bearing premise

The Sniper feature's retrieval, hybrid ranking, and intention-aware context gate preserve all evidence required for the agent to maintain near-baseline resolution rates without discarding critical information.

What would settle it

A set of SWE-bench Lite tasks where the baseline agent succeeds by using a code snippet or log entry that the context gate removes would produce a substantially larger drop in resolution rate than the observed 2 percentage points.

Figures

Figures reproduced from arXiv: 2607.01916 by Chiwang Luk, Gao Cong, Jinwei Zhu, Lei Chen, Matin Mohammad Najafi, Wei Yang, Xiuchang Li, Yang Ren, Zhifeng Jia.

**Figure 1.** Figure 1: ContextSniper turns noisy repository context into compact repair evidence. (A) and (B) share the same host-agent workflow; only the context-acquisition module differs. (A) Native Grep, Read, and Bash outputs accumulate into an unfiltered prompt context, and failed attempts reread similar files. (B) A ContextSniper insertion layer replaces broad discovery with search code excerpts and gates long Read/Bash o… view at source ↗

**Figure 2.** Figure 2: ContextSniper architecture. The upper path shows the baseline interaction loop, where broad search, long file reads, repeated actions, and raw command output accumulate noisy context before reaching the model’s context window. The lower path shows the ContextSniper insertion layer. Repository and action evidence are stored in an AGFS-backed memory hierarchy, synchronized with the current working tree, retr… view at source ↗

**Figure 3.** Figure 3: Two-family L0-L2 memory hierarchy. Action memory stores compact action views, action metadata, and recoverable original tool output. Code memory stores code abstracts, structured routing metadata, and source-grounded chunks. outputs often contain the needed repair evidence, but they also carry unrelated functions, repeated logs, and stale exploratory context into the model window. ContextSniper keeps the s… view at source ↗

**Figure 4.** Figure 4: Pseudocode for code search with adaptive top-k retrieval. The query planner controls retrieval depth and route budgets, while the final response remains a compact set of source-grounded L2 snippets. files are re-chunked or refreshed, deleted snippets are invalidated, and newly relevant regions become available for subsequent searches. This phase catches changes that did not pass through a tool hook and pre… view at source ↗

**Figure 5.** Figure 5: Pseudocode for intention-aware filtering and the context gate. The gate keeps action-relevant evidence, records the recoverable original output, and marks shortened views so the host agent can request broader context when needed. 3.6 Agent Integrations ContextSniper is implemented as an insertion layer for existing coding agents. In the Claude Code setting, it is exposed through Model Context Protocol (MCP… view at source ↗

**Figure 6.** Figure 6: Within-host efficiency reduction relative to each host agent’s baseline setting. Absolute token and cost values are reported in [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Per-task token-usage comparison on matched SWE-bench Lite tasks for OpenClaw and Claude Code. Each point compares the host-agent baseline and ContextSniper for one task. Points below the diagonal indicate tasks where ContextSniper uses fewer tokens. django sphinx scikit-learn psf sympy pytest mwaskom pylint pydata astropy pallets matplotlib −80 −60 −40 −20 0 Repository Token change vs. baseline (%) Token C… view at source ↗

**Figure 8.** Figure 8: Repository-level token change on the matched SWE-bench Lite task sets for OpenClaw and Claude Code. Values aggregate total tokens by repository before computing relative change against each host-agent baseline. Negative values indicate token savings from ContextSniper. 4.3 Task-Level Failure Analysis The matched OpenClaw comparison shows the intended behavior on most tasks: ContextSniper uses fewer tokens … view at source ↗

**Figure 9.** Figure 9: complements this task-level analysis by measuring whether the retriever surfaces the eventual target file within the first returned paths. Low-recall repositories identify where the first failure pattern is more likely: if the target file is absent from the early results, the host agent must either issue more ContextSniper searches or fall back to broader native exploration. The token-increase cases theref… view at source ↗

**Figure 10.** Figure 10: Five-task Django pilot comparison with memory and RAG-style integrations. Bar length shows average logged tokens per task. The pilot is exploratory and smaller than the main 50-task OpenClaw comparison; task-resolution counts are reported in the text. 5 Discussion 5.1 ContextSniper as Agent Infrastructure ContextSniper treats code memory and context control as infrastructure for coding agents. Placing the… view at source ↗

read the original abstract

Large language model agents can repair real repository issues, but they often spend large context budgets on whole-file reads, broad searches, and long terminal outputs where useful evidence is mixed with irrelevant code and logs. This paper presents ContextSniper, AntTrail's token-efficient code memory layer for repository-level program repair. As the coding specialization of AntTrail's broader agent memory engine, ContextSniper implements the Sniper feature for precision evidence selection: it retrieves candidate code and runtime evidence, ranks it with hybrid retrieval signals, filters long outputs through an intention-aware context gate, and returns compact evidence packets while preserving recoverable source context outside the prompt. We evaluate ContextSniper on SWE-bench Lite with OpenClaw and Claude Code, using 50 task runs per host-agent condition. ContextSniper reduces total token use by 51.5% and logged cost by 36.4% for OpenClaw, and reduces total token use by 38.9% and estimated cost by 27.3% for Claude Code. Submitted-resolution rates decrease slightly, from 26.0% to 24.0% for OpenClaw and from 32.0% to 30.0% for Claude Code. ContextSniper's pilot testing scripts are open-sourced at https://github.com/Calluking/ContextSniper

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ContextSniper reports clear token and cost cuts on SWE-bench Lite but supplies no ablations or failure cases to support the claim that its gate preserves needed evidence.

read the letter

The main point is that this paper shows a specialized evidence-selection layer inside the AntTrail framework that trims token use by 39-52% and cost by 27-36% across two agents on SWE-bench Lite, with only a 2-point drop in resolution rate.

ContextSniper adds retrieval of candidate code and runtime traces, hybrid ranking, and an intention-aware gate that filters long outputs before packing compact evidence. The authors ran 50 tasks per condition with OpenClaw and Claude Code, measured the savings, and released the pilot scripts. Those numbers and the open code are the concrete deliverable.

The soft spot is exactly the one the stress-test note flags. The headline result depends on the gate never discarding information the agent needs for the tasks that flip from success to failure. The abstract gives only the pipeline description and the aggregate scores; there are no ablations that remove the gate, no per-task breakdowns, no variance numbers, and no examination of the two-percentage-point failures. At n=50 the observed drop sits inside the range that could be sampling noise or could be the direct result of lost context. Without those checks the preservation assumption stays untested.

The work is aimed at engineers who already run LLM agents on repository repair and want lower inference spend. A reader who needs a working example and baseline numbers will get value from the reported deltas and the released scripts. Someone looking for a validated mechanism or limits will find the current version thin.

I would send it for peer review. The benchmark is standard, the savings are large enough to be practically relevant, and the open scripts give referees something concrete to inspect. The review would almost certainly ask for the missing ablations and case analysis, but the empirical report is worth that step.

Referee Report

2 major / 0 minor

Summary. The paper introduces ContextSniper as AntTrail's token-efficient code memory layer specialized for repository-level program repair. It describes a Sniper feature that retrieves candidate evidence, applies hybrid ranking, and uses an intention-aware context gate to produce compact evidence packets. On SWE-bench Lite with 50 task runs per condition, the system is evaluated with OpenClaw and Claude Code hosts; it reports 51.5% and 38.9% reductions in total token use (with corresponding cost reductions of 36.4% and 27.3%) while submitted-resolution rates fall only from 26% to 24% and 32% to 30%, respectively. Pilot scripts are open-sourced.

Significance. If the reported token and cost savings prove robust while preserving resolution rates, the work would offer a practical advance for scaling LLM-based repository repair agents, where context budgets are a primary deployment constraint. The open-sourcing of testing scripts provides a modest reproducibility asset.

major comments (2)

[Abstract] Abstract / Evaluation paragraph: the headline quantitative claims (51.5% token reduction for OpenClaw, 38.9% for Claude Code; 2 pp resolution drops) rest on aggregate numbers from 50 runs per condition but supply no baselines, per-run variance, statistical tests, or ablation results that would confirm the intention-aware context gate never discards evidence required for the tasks that flip from success to failure.
[Abstract] Pipeline description (Sniper feature): the central assumption that hybrid retrieval plus the intention-aware context gate 'preserve recoverable source context' and maintain near-baseline resolution is stated at a high level but is unsupported by mechanism details, failure-case analysis, or an ablation that removes the gate; with only a 2 pp aggregate drop this assumption is load-bearing for the claim that savings come at negligible performance cost.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed feedback on the quantitative claims and the supporting evidence for ContextSniper's mechanisms. We respond point-by-point below and indicate where revisions will be made to address the concerns.

read point-by-point responses

Referee: [Abstract] Abstract / Evaluation paragraph: the headline quantitative claims (51.5% token reduction for OpenClaw, 38.9% for Claude Code; 2 pp resolution drops) rest on aggregate numbers from 50 runs per condition but supply no baselines, per-run variance, statistical tests, or ablation results that would confirm the intention-aware context gate never discards evidence required for the tasks that flip from success to failure.

Authors: The evaluation protocol uses 50 independent task runs per condition to derive the reported aggregates, as stated in the manuscript. We agree that the abstract would benefit from additional statistical context. In the revision we will add per-run standard deviations to the headline figures and include a brief note on sample size and the absence of formal significance testing. The full manuscript contains a failure-case analysis (Section 5) showing that the two-percentage-point drops were attributable to factors outside the context gate; however, no explicit ablation that disables the gate is present, and we will explicitly note this limitation rather than claim such confirmation. revision: partial
Referee: [Abstract] Pipeline description (Sniper feature): the central assumption that hybrid retrieval plus the intention-aware context gate 'preserve recoverable source context' and maintain near-baseline resolution is stated at a high level but is unsupported by mechanism details, failure-case analysis, or an ablation that removes the gate; with only a 2 pp aggregate drop this assumption is load-bearing for the claim that savings come at negligible performance cost.

Authors: Section 3 of the manuscript supplies the mechanism details for hybrid ranking and the intention-aware context gate, including how recoverable context is preserved outside the prompt. The small aggregate drop is offered as supporting evidence for low performance cost, with the open-sourced pilot scripts enabling independent verification. We will revise the abstract to cross-reference these sections and incorporate a concise failure-case summary drawn from the existing manuscript analysis. An ablation that removes the gate is not included and would require new experiments; we will acknowledge this gap in the revised discussion. revision: partial

Circularity Check

0 steps flagged

No circularity: results are direct empirical measurements on benchmark tasks

full rationale

The paper describes a retrieval/filtering pipeline and reports aggregate token/cost savings plus resolution rates from 50 runs per condition on SWE-bench Lite. No equations, fitted parameters, predictions, or self-citations appear in the provided text. All headline numbers are measured outcomes, not derived quantities that reduce to the inputs by construction. The central claim therefore stands on external benchmark data rather than any self-referential step.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input provides no identifiable free parameters, axioms, or invented entities; the system is described at the level of retrieval, ranking, and gating steps without mathematical formulation.

pith-pipeline@v0.9.1-grok · 5799 in / 1128 out tokens · 38749 ms · 2026-07-03T14:06:51.618458+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 21 canonical work pages · 8 internal anchors

[1]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan, “SWE-bench: Can language models resolve real-world GitHub issues?” inInternational Conference on Learning Representations, 2024. [Online]. Available: https://arxiv.org/abs/2310.06770

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, and O. Press, “SWE-agent: Agent-computer interfaces enable automated software engineering,” 2024. [Online]. Available: https://arxiv.org/abs/2405.15793

work page internal anchor Pith review Pith/arXiv arXiv 2024
[3]

Autocoderover: Au- tonomous program improvement, 2024

Y. Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “AutoCodeRover: Autonomous program improvement,” 2024. [Online]. Available: https://arxiv.org/abs/2404.05427

work page arXiv 2024
[4]

Agentless: Demystifying LLM-based software engineering agents,

C. S. Xia, Y. Deng, S. Dunn, and L. Zhang, “Agentless: Demystifying LLM-based software engineering agents,”
[5]

Agentless: Demystifying LLM-based Software Engineering Agents

[Online]. Available: https://arxiv.org/abs/2407.01489

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Claude code documentation: Overview

Anthropic, “Claude code documentation: Overview.” [Online]. Available: https://code.claude.com/docs/en/ overview
[7]

ACON: Optimizing Context Compression for Long-horizon LLM Agents

M. Kang, W.-N. Chen, D. Han, H. A. Inan, L. Wutschitz, Y. Chen, R. Sim, and S. Rajmohan, “ACON: Optimizing context compression for long-horizon LLM agents,” 2025. [Online]. Available: https://arxiv.org/abs/2510.00615

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Codebase-memory: Tree-sitter-based knowledge graphs for LLM code exploration via MCP,

M. Vogel, F. Meyer-Eschenbach, S. Kohler, E. Gr¨ unewald, and F. Balzer, “Codebase-memory: Tree-sitter-based knowledge graphs for LLM code exploration via MCP,” 2026. [Online]. Available: https://arxiv.org/abs/2603.27277

work page arXiv 2026
[9]

Lost in the Middle: How Language Models Use Long Contexts

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the middle: How language models use long contexts,”Transactions of the Association for Computational Linguistics, vol. 12, pp. 157–173, 2024. [Online]. Available: https://arxiv.org/abs/2307.03172

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K¨ uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 9459–9474. [Online]. Available: https://arxiv.org/abs/2005.11401 14 Prepr...

work page internal anchor Pith review Pith/arXiv arXiv 2020
[11]

RepoGraph: Enhancing AI software engineering with repository-level code graph,

S. Ouyang, W. Yu, K. Ma, Z. Xiao, Z. Zhang, M. Jia, J. Han, H. Zhang, and D. Yu, “RepoGraph: Enhancing AI software engineering with repository-level code graph,” 2024. [Online]. Available: https://arxiv.org/abs/2410.14684

work page arXiv 2024
[12]

Enhancing repository-level software repair via repository-aware knowledge graphs,

B. Yang, J. Ren, S. Jin, Y. Liu, F. Liu, B. Le, and H. Tian, “Enhancing repository-level software repair via repository-aware knowledge graphs,” 2025. [Online]. Available: https://arxiv.org/abs/2503.21710

work page arXiv 2025
[13]

Improving code localization with repository memory,

B. Wang, W. Xu, Y. Li, M. Gao, Y. Xie, H. Sun, and D. Chen, “Improving code localization with repository memory,” 2025. [Online]. Available: https://arxiv.org/abs/2510.01003

work page arXiv 2025
[14]

Llmlingua: Com- pressing prompts for accelerated inference of large language models, 2023

H. Jiang, Q. Wu, C.-Y. Lin, Y. Yang, and L. Qiu, “LLMLingua: Compressing prompts for accelerated inference of large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2310.05736

work page arXiv 2023
[15]

LongLLMLingua: Extending LLMs’ context windows without tuning,

——, “LongLLMLingua: Extending LLMs’ context windows without tuning,” 2024. [Online]. Available: https://arxiv.org/abs/2403.12957

work page arXiv 2024
[16]

RECOMP: Improving retrieval-augmented LMs with compression and selective augmentation,

F. Xu, W. Shi, and E. Choi, “RECOMP: Improving retrieval-augmented LMs with compression and selective augmentation,” 2023. [Online]. Available: https://arxiv.org/abs/2310.04408

work page arXiv 2023
[17]

COMPACT: Compressing retrieved documents actively for question answering,

C. Yoon, T. Lee, H. Hwang, M. Jeong, and J. Kang, “COMPACT: Compressing retrieved documents actively for question answering,” 2024. [Online]. Available: https://arxiv.org/abs/2407.09014

work page arXiv 2024
[18]

Selective Context: Compress input to ChatGPT or other LLMs,

Selective Context Contributors, “Selective Context: Compress input to ChatGPT or other LLMs,” 2023. [Online]. Available: https://github.com/liyucheng09/Selective Context

2023
[19]

Compressing context to enhance inference efficiency of large language models,

A. Chevalier, A. Wettig, A. Ajith, and D. Chen, “Compressing context to enhance inference efficiency of large language models,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 6322–6333. [Online]. Available: https://aclanthology.org/2023.emnlp-main.391/

2023
[20]

ctxbudgeter: ContextOps toolkit for production AI agents

ctxbudgeter Contributors, “ctxbudgeter: ContextOps toolkit for production AI agents.” [Online]. Available: https://github.com/Kayariyan28/ctxbudgeter
[21]

Token Reducer: Local-first context compression for Claude Code

Token Reducer Contributors, “Token Reducer: Local-first context compression for Claude Code.” [Online]. Available: https://github.com/Madhan230205/token-reducer
[22]

RTK: Rust token killer

RTK Contributors, “RTK: Rust token killer.” [Online]. Available: https://github.com/rtk-ai/rtk
[23]

Headroom: The context compression layer for AI agents

Headroom Contributors, “Headroom: The context compression layer for AI agents.” [Online]. Available: https://github.com/headroomlabs-ai/headroom
[24]

Bearing: Task runner for directing AI coding agents

Bearing Contributors, “Bearing: Task runner for directing AI coding agents.” [Online]. Available: https://github.com/rocketvish/bearing
[25]

OpenClaw: Personal AI assistant

OpenClaw Contributors, “OpenClaw: Personal AI assistant.” [Online]. Available: https://github.com/openclaw/ openclaw
[26]

SWE-Exp: Experience-driven software issue resolution,

S. Chen, S. Lin, Y. Shi, H. Lian, X. Gu, L. Yun, D. Chen, L. Cao, J. Liu, N. Xia, and Q. Wang, “SWE-Exp: Experience-driven software issue resolution,” 2025. [Online]. Available: https://arxiv.org/abs/2507.23361

work page arXiv 2025
[27]

EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair

F. Mu, J. Wang, L. Shi, S. Wang, S. Li, and Q. Wang, “EXPEREPAIR: Dual-memory enhanced LLM-based repository-level program repair,” 2025. [Online]. Available: https://arxiv.org/abs/2506.10484

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

MemGovern: Enhancing code agents through learning from governed human experiences,

Q. Wang, Z. Cheng, S. Zhang, F. Liu, R. Xu, H. Lian, K. Wang, X. Yu, J. Yin, S. Hu, Y. Hu, S. Zhang, Y. Liu, R. Chen, and H. Wang, “MemGovern: Enhancing code agents through learning from governed human experiences,” 2026. [Online]. Available: https://arxiv.org/abs/2601.06789

work page arXiv 2026
[29]

Structurally aligned subtask-level memory for software engineering agents,

K. Shen, J. Zhang, C. Sun, W. Zeng, and Y. Yue, “Structurally aligned subtask-level memory for software engineering agents,” 2026. [Online]. Available: https://arxiv.org/abs/2602.21611

work page arXiv 2026
[30]

MEMCoder: Multi-dimensional Evolving Memory for Private-Library-Oriented Code Generation

M. Li, T. Chen, G. Yang, and J. Li, “MEMCoder: Multi-dimensional evolving memory for private-library-oriented code generation,” 2026. [Online]. Available: https://arxiv.org/abs/2604.24222

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

Learning to commit: Generating organic pull requests via online repository memory,

M. Li, L. H. Xu, Q. Tan, T. Cao, and Y. Liu, “Learning to commit: Generating organic pull requests via online repository memory,” 2026. [Online]. Available: https://arxiv.org/abs/2603.26664

work page arXiv 2026
[32]

The probabilistic relevance framework: BM25 and beyond,

S. Robertson and H. Zaragoza, “The probabilistic relevance framework: BM25 and beyond,” inFoundations and Trends in Information Retrieval, 2009, vol. 3, no. 4, pp. 333–389

2009
[33]

Reciprocal rank fusion outperforms condorcet and individual rank learning methods,

G. V. Cormack, C. L. A. Clarke, and S. Buettcher, “Reciprocal rank fusion outperforms condorcet and individual rank learning methods,” inProceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009, pp. 758–759

2009
[34]

ripgrep

A. Gallant, “ripgrep.” [Online]. Available: https://github.com/BurntSushi/ripgrep
[35]

AGFS: Agent file system

AGFS Contributors, “AGFS: Agent file system.” [Online]. Available: https://github.com/c4pt0r/agfs 15 Preprint ContextSniper: Token-Efficient Code Memory for Repository-Level Program Repair
[36]

Universal ctags

Universal Ctags Contributors, “Universal ctags.” [Online]. Available: https://github.com/universal-ctags/ctags
[37]

Claude models overview

Anthropic, “Claude models overview.” [Online]. Available: https://platform.claude.com/docs/en/about-claude/ models/overview
[38]

mem0: Universal memory layer for AI agents

mem0 Contributors, “mem0: Universal memory layer for AI agents.” [Online]. Available: https: //github.com/mem0ai/mem0
[39]

Letta: Stateful agents and MemGPT

Letta Contributors, “Letta: Stateful agents and MemGPT.” [Online]. Available: https://github.com/letta-ai/letta
[40]

OpenViking: Context database for AI agents

OpenViking Contributors, “OpenViking: Context database for AI agents.” [Online]. Available: https://github.com/volcengine/OpenViking
[41]

TencentDB Agent Memory

TencentDB Agent Memory Contributors, “TencentDB Agent Memory.” [Online]. Available: https: //github.com/TencentCloud/TencentDB-Agent-Memory
[42]

Serena: Semantic coding toolkit for agents

Serena Contributors, “Serena: Semantic coding toolkit for agents.” [Online]. Available: https: //github.com/oraios/serena
[43]

LlamaIndex: Framework for agentic applications and retrieval-augmented generation

LlamaIndex Contributors, “LlamaIndex: Framework for agentic applications and retrieval-augmented generation.” [Online]. Available: https://github.com/run-llama/llama index 16

[1] [1]

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan, “SWE-bench: Can language models resolve real-world GitHub issues?” inInternational Conference on Learning Representations, 2024. [Online]. Available: https://arxiv.org/abs/2310.06770

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, and O. Press, “SWE-agent: Agent-computer interfaces enable automated software engineering,” 2024. [Online]. Available: https://arxiv.org/abs/2405.15793

work page internal anchor Pith review Pith/arXiv arXiv 2024

[3] [3]

Autocoderover: Au- tonomous program improvement, 2024

Y. Zhang, H. Ruan, Z. Fan, and A. Roychoudhury, “AutoCodeRover: Autonomous program improvement,” 2024. [Online]. Available: https://arxiv.org/abs/2404.05427

work page arXiv 2024

[4] [4]

Agentless: Demystifying LLM-based software engineering agents,

C. S. Xia, Y. Deng, S. Dunn, and L. Zhang, “Agentless: Demystifying LLM-based software engineering agents,”

[5] [5]

Agentless: Demystifying LLM-based Software Engineering Agents

[Online]. Available: https://arxiv.org/abs/2407.01489

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Claude code documentation: Overview

Anthropic, “Claude code documentation: Overview.” [Online]. Available: https://code.claude.com/docs/en/ overview

[7] [7]

ACON: Optimizing Context Compression for Long-horizon LLM Agents

M. Kang, W.-N. Chen, D. Han, H. A. Inan, L. Wutschitz, Y. Chen, R. Sim, and S. Rajmohan, “ACON: Optimizing context compression for long-horizon LLM agents,” 2025. [Online]. Available: https://arxiv.org/abs/2510.00615

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Codebase-memory: Tree-sitter-based knowledge graphs for LLM code exploration via MCP,

M. Vogel, F. Meyer-Eschenbach, S. Kohler, E. Gr¨ unewald, and F. Balzer, “Codebase-memory: Tree-sitter-based knowledge graphs for LLM code exploration via MCP,” 2026. [Online]. Available: https://arxiv.org/abs/2603.27277

work page arXiv 2026

[9] [9]

Lost in the Middle: How Language Models Use Long Contexts

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the middle: How language models use long contexts,”Transactions of the Association for Computational Linguistics, vol. 12, pp. 157–173, 2024. [Online]. Available: https://arxiv.org/abs/2307.03172

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. K¨ uttler, M. Lewis, W.-t. Yih, T. Rockt¨aschel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 9459–9474. [Online]. Available: https://arxiv.org/abs/2005.11401 14 Prepr...

work page internal anchor Pith review Pith/arXiv arXiv 2020

[11] [11]

RepoGraph: Enhancing AI software engineering with repository-level code graph,

S. Ouyang, W. Yu, K. Ma, Z. Xiao, Z. Zhang, M. Jia, J. Han, H. Zhang, and D. Yu, “RepoGraph: Enhancing AI software engineering with repository-level code graph,” 2024. [Online]. Available: https://arxiv.org/abs/2410.14684

work page arXiv 2024

[12] [12]

Enhancing repository-level software repair via repository-aware knowledge graphs,

B. Yang, J. Ren, S. Jin, Y. Liu, F. Liu, B. Le, and H. Tian, “Enhancing repository-level software repair via repository-aware knowledge graphs,” 2025. [Online]. Available: https://arxiv.org/abs/2503.21710

work page arXiv 2025

[13] [13]

Improving code localization with repository memory,

B. Wang, W. Xu, Y. Li, M. Gao, Y. Xie, H. Sun, and D. Chen, “Improving code localization with repository memory,” 2025. [Online]. Available: https://arxiv.org/abs/2510.01003

work page arXiv 2025

[14] [14]

Llmlingua: Com- pressing prompts for accelerated inference of large language models, 2023

H. Jiang, Q. Wu, C.-Y. Lin, Y. Yang, and L. Qiu, “LLMLingua: Compressing prompts for accelerated inference of large language models,” 2023. [Online]. Available: https://arxiv.org/abs/2310.05736

work page arXiv 2023

[15] [15]

LongLLMLingua: Extending LLMs’ context windows without tuning,

——, “LongLLMLingua: Extending LLMs’ context windows without tuning,” 2024. [Online]. Available: https://arxiv.org/abs/2403.12957

work page arXiv 2024

[16] [16]

RECOMP: Improving retrieval-augmented LMs with compression and selective augmentation,

F. Xu, W. Shi, and E. Choi, “RECOMP: Improving retrieval-augmented LMs with compression and selective augmentation,” 2023. [Online]. Available: https://arxiv.org/abs/2310.04408

work page arXiv 2023

[17] [17]

COMPACT: Compressing retrieved documents actively for question answering,

C. Yoon, T. Lee, H. Hwang, M. Jeong, and J. Kang, “COMPACT: Compressing retrieved documents actively for question answering,” 2024. [Online]. Available: https://arxiv.org/abs/2407.09014

work page arXiv 2024

[18] [18]

Selective Context: Compress input to ChatGPT or other LLMs,

Selective Context Contributors, “Selective Context: Compress input to ChatGPT or other LLMs,” 2023. [Online]. Available: https://github.com/liyucheng09/Selective Context

2023

[19] [19]

Compressing context to enhance inference efficiency of large language models,

A. Chevalier, A. Wettig, A. Ajith, and D. Chen, “Compressing context to enhance inference efficiency of large language models,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 6322–6333. [Online]. Available: https://aclanthology.org/2023.emnlp-main.391/

2023

[20] [20]

ctxbudgeter: ContextOps toolkit for production AI agents

ctxbudgeter Contributors, “ctxbudgeter: ContextOps toolkit for production AI agents.” [Online]. Available: https://github.com/Kayariyan28/ctxbudgeter

[21] [21]

Token Reducer: Local-first context compression for Claude Code

Token Reducer Contributors, “Token Reducer: Local-first context compression for Claude Code.” [Online]. Available: https://github.com/Madhan230205/token-reducer

[22] [22]

RTK: Rust token killer

RTK Contributors, “RTK: Rust token killer.” [Online]. Available: https://github.com/rtk-ai/rtk

[23] [23]

Headroom: The context compression layer for AI agents

Headroom Contributors, “Headroom: The context compression layer for AI agents.” [Online]. Available: https://github.com/headroomlabs-ai/headroom

[24] [24]

Bearing: Task runner for directing AI coding agents

Bearing Contributors, “Bearing: Task runner for directing AI coding agents.” [Online]. Available: https://github.com/rocketvish/bearing

[25] [25]

OpenClaw: Personal AI assistant

OpenClaw Contributors, “OpenClaw: Personal AI assistant.” [Online]. Available: https://github.com/openclaw/ openclaw

[26] [26]

SWE-Exp: Experience-driven software issue resolution,

S. Chen, S. Lin, Y. Shi, H. Lian, X. Gu, L. Yun, D. Chen, L. Cao, J. Liu, N. Xia, and Q. Wang, “SWE-Exp: Experience-driven software issue resolution,” 2025. [Online]. Available: https://arxiv.org/abs/2507.23361

work page arXiv 2025

[27] [27]

EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair

F. Mu, J. Wang, L. Shi, S. Wang, S. Li, and Q. Wang, “EXPEREPAIR: Dual-memory enhanced LLM-based repository-level program repair,” 2025. [Online]. Available: https://arxiv.org/abs/2506.10484

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

MemGovern: Enhancing code agents through learning from governed human experiences,

Q. Wang, Z. Cheng, S. Zhang, F. Liu, R. Xu, H. Lian, K. Wang, X. Yu, J. Yin, S. Hu, Y. Hu, S. Zhang, Y. Liu, R. Chen, and H. Wang, “MemGovern: Enhancing code agents through learning from governed human experiences,” 2026. [Online]. Available: https://arxiv.org/abs/2601.06789

work page arXiv 2026

[29] [29]

Structurally aligned subtask-level memory for software engineering agents,

K. Shen, J. Zhang, C. Sun, W. Zeng, and Y. Yue, “Structurally aligned subtask-level memory for software engineering agents,” 2026. [Online]. Available: https://arxiv.org/abs/2602.21611

work page arXiv 2026

[30] [30]

MEMCoder: Multi-dimensional Evolving Memory for Private-Library-Oriented Code Generation

M. Li, T. Chen, G. Yang, and J. Li, “MEMCoder: Multi-dimensional evolving memory for private-library-oriented code generation,” 2026. [Online]. Available: https://arxiv.org/abs/2604.24222

work page internal anchor Pith review Pith/arXiv arXiv 2026

[31] [31]

Learning to commit: Generating organic pull requests via online repository memory,

M. Li, L. H. Xu, Q. Tan, T. Cao, and Y. Liu, “Learning to commit: Generating organic pull requests via online repository memory,” 2026. [Online]. Available: https://arxiv.org/abs/2603.26664

work page arXiv 2026

[32] [32]

The probabilistic relevance framework: BM25 and beyond,

S. Robertson and H. Zaragoza, “The probabilistic relevance framework: BM25 and beyond,” inFoundations and Trends in Information Retrieval, 2009, vol. 3, no. 4, pp. 333–389

2009

[33] [33]

Reciprocal rank fusion outperforms condorcet and individual rank learning methods,

G. V. Cormack, C. L. A. Clarke, and S. Buettcher, “Reciprocal rank fusion outperforms condorcet and individual rank learning methods,” inProceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009, pp. 758–759

2009

[34] [34]

ripgrep

A. Gallant, “ripgrep.” [Online]. Available: https://github.com/BurntSushi/ripgrep

[35] [35]

AGFS: Agent file system

AGFS Contributors, “AGFS: Agent file system.” [Online]. Available: https://github.com/c4pt0r/agfs 15 Preprint ContextSniper: Token-Efficient Code Memory for Repository-Level Program Repair

[36] [36]

Universal ctags

Universal Ctags Contributors, “Universal ctags.” [Online]. Available: https://github.com/universal-ctags/ctags

[37] [37]

Claude models overview

Anthropic, “Claude models overview.” [Online]. Available: https://platform.claude.com/docs/en/about-claude/ models/overview

[38] [38]

mem0: Universal memory layer for AI agents

mem0 Contributors, “mem0: Universal memory layer for AI agents.” [Online]. Available: https: //github.com/mem0ai/mem0

[39] [39]

Letta: Stateful agents and MemGPT

Letta Contributors, “Letta: Stateful agents and MemGPT.” [Online]. Available: https://github.com/letta-ai/letta

[40] [40]

OpenViking: Context database for AI agents

OpenViking Contributors, “OpenViking: Context database for AI agents.” [Online]. Available: https://github.com/volcengine/OpenViking

[41] [41]

TencentDB Agent Memory

TencentDB Agent Memory Contributors, “TencentDB Agent Memory.” [Online]. Available: https: //github.com/TencentCloud/TencentDB-Agent-Memory

[42] [42]

Serena: Semantic coding toolkit for agents

Serena Contributors, “Serena: Semantic coding toolkit for agents.” [Online]. Available: https: //github.com/oraios/serena

[43] [43]

LlamaIndex: Framework for agentic applications and retrieval-augmented generation

LlamaIndex Contributors, “LlamaIndex: Framework for agentic applications and retrieval-augmented generation.” [Online]. Available: https://github.com/run-llama/llama index 16