RepoShapley: Shapley-Enhanced Context Filtering for Repository-Level Code Completion

Yu Huo , Kun Zeng , Siyu Zhang , Yuquan Lu , Cheng Yang , Yifu Guo , Xiaoying Tang

Authors on Pith no claims yet

classification 💻 cs.SE

keywords contextcompletionreposhapleyretrievalcodefilteringrepository-levelwhen

read the original abstract

Repository-level code completion benefits from retrieval-augmented generation (RAG). However, controlling cross-file evidence is difficult because chunk utility is often interaction-dependent: some snippets help only when paired with complementary context, while others harm decoding when they conflict. We propose RepoShapley, a coalition-aware context filtering framework supervised by Shapley-style marginal contributions. Our offline labeling module, ChunkShapley, estimates signed per-chunk effects via teacher-forced probing, feeds them into a lightweight surrogate game that captures saturation and interference, computes exact Shapley values for small retrieval sets, and selects a decoding-optimal coalition through bounded post-verification with the frozen generator. The verified keep/drop decisions and retrieval triggers are then distilled into a single model via discrete control tokens. Experiments across benchmarks and backbones show that RepoShapley improves completion quality while reducing harmful context and unnecessary retrieval.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context
cs.SE 2026-05 accept novelty 7.0

Stale repository context in code RAG actively induces models to produce obsolete helper references, raising stale outputs by 76-88 percentage points over current-only retrieval in a 17-sample diagnostic study.
Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries
cs.CL 2026-05 unverdicted novelty 6.0

GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.