CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs

Pengfei He; Shaowei Wang; Tse-Hsun Chen

arxiv: 2502.14925 · v2 · submitted 2025-02-19 · 💻 cs.SE

CODEPROMPTZIP: Code-specific Prompt Compression for Retrieval-Augmented Generation in Coding Tasks with LMs

Pengfei He , Shaowei Wang , Tse-Hsun Chen This is my paper

Pith reviewed 2026-05-23 01:55 UTC · model grok-4.3

classification 💻 cs.SE

keywords prompt compressionretrieval-augmented generationcode compressionprogram analysistoken prioritizationcopy mechanismRAG for coding taskslanguage model compression

0 comments

The pith

CodePromptZip compresses code prompts for RAG by ranking token types through program analysis and training a copy-augmented small LM.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a code-specific compression method for retrieval-augmented generation in software engineering tasks where prompts often exceed practical context limits. It identifies token types such as identifiers via program analysis, ranks their removal order by ablation impact on downstream accuracy, and uses those rankings to build training data for a small language model. The model learns to produce compressed versions at chosen ratios while a built-in copy mechanism lets it retain critical tokens from the original snippet. This targets the gap that general natural-language compressors leave unaddressed in code. A reader would care because shorter yet faithful code contexts would allow more retrieved examples or lower inference cost without retraining the main coding model.

Core claim

CodePromptZip employs a type-aware, priority-driven strategy to construct training samples for a code compression model by using program analysis to rank token types based on their impact on task performance, then trains a small LM augmented with a copy mechanism to enable flexible compression that minimizes performance degradation, surpassing entropy-based and distillation-based baselines with gains of 23.4 percent, 28.7 percent, and 8.7 percent on assertion generation, bugs-to-fix, and code suggestion.

What carries the argument

Type-aware priority-driven token removal ranking obtained from ablation analysis on program-analysis-identified token types, used to supervise training of a small LM compressor equipped with a copy mechanism.

If this is right

More retrieved code examples fit inside fixed context windows for RAG coding workflows.
Prompt processing cost drops while task accuracy on assertion generation, bug repair, and suggestion rises.
Compression ratio becomes a controllable input rather than a fixed hyper-parameter.
The same priority ranking supports multiple downstream coding models without retraining the compressor.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the learned priorities transfer across languages, the approach could serve as a lightweight pre-processing step for multilingual codebases.
Combining the copy-augmented compressor with existing entropy-based filters might yield further length reductions.
Testing the compressor on larger main models would show whether the relative gains persist when the downstream LM itself has greater capacity.

Load-bearing premise

The token-type removal priorities obtained from ablation analysis on the training tasks will remain effective when the compressor is applied to new tasks, different programming languages, or different downstream models.

What would settle it

Applying the trained compressor to a new coding task or programming language yields no gain or clear degradation relative to the strongest baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2502.14925 by Pengfei He, Shaowei Wang, Tse-Hsun Chen.

**Figure 2.** Figure 2: Framework of CODEPROMPTZIP. Algorithm 1: Priority-driven Greedy Algorithm for Dataset Construction Input: x code i = {xj} L j=1, τcode, type priorities of T . Output: ex code i . 1: Initialize a priority queue pq. 2: for each token xj ∈ x code i do 3: Assign priority to xj (Prioritize the drop of high-frequency tokens in prioritized type). 4: Insert xj into pq. 5: end for 6: removedTokens ← ∅. 7: Lrm ← ⌊τc… view at source ↗

**Figure 3.** Figure 3: Illustration of copy mechanism on CodeT5. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Trade-off between keeping more tokens in a [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Compression ratio control. 0.0 0.2 0.4 Exact Match (a) : CodeLlama-13B 0.0 0.2 0.4 CodeBleu (b) : CodeLlama-13B 0.00 0.25 0.50 CodeBleu (c) : CodeLlama-13B 0.00 0.25 0.50 Exact Match (a) : Gemini-1.0-pro 0.0 0.2 0.4 CodeBleu (b) : Gemini-1.0-pro 0.0 0.1 0.2 CodeBleu (c) : Gemini-1.0-pro LLMLingua LongLLMLingua LLMLingua-2 RECOMP CodePromptZip (a) Assertion Generation (b) Bugs2Fix (c) Code Suggestion [PITH… view at source ↗

**Figure 6.** Figure 6: Performance of the proposed CODEPROMPTZIP across different BLMs. 6.4 RQ4: Transferability with Different BLM CODEPROMPTZIP consistently outperforms baselines across studied base LMs CodeLlama13B and Gemini-1.0 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: The illustration of different RAG coding tasks [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Original Code Examples of Assertion Genera [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Compressed Code Examples of Assertion Generation (55 tokens, τcode: 0.1) ### FOCAL_METHOD getProduction(java.lang.String) return productionsByName; ### UNIT_TEST testJustifications() ; org.jsoar.kernel.Production j = agent; "<AssertPlaceHolder>"; [PITH_FULL_IMAGE:figures/full_fig_p013_9.png] view at source ↗

**Figure 10.** Figure 10: Compressed Code Examples of Assertion Generation (39 tokens, τcode: 0.4) ### BUGGY_CODE public static TYPE_1 init(java.lang.String name, java.util.Date date) { TYPE_1 VAR_1 = new TYPE_1(); VAR_1.METHOD_1(name); java.util.Calendar VAR_2 = java.util.Calendar.getInstance(); VAR_2.METHOD_2(date); VAR_1.METHOD_3(VAR_2); return VAR_1; } ### FIXED_CODE public static TYPE_1 init(java.lang.String name, java.util.D… view at source ↗

**Figure 13.** Figure 13: Original Code Examples of Code Suggestion [PITH_FULL_IMAGE:figures/full_fig_p014_13.png] view at source ↗

**Figure 14.** Figure 14: Compressed Code Examples of Code Suggestion (121 tokens, τcode: 0.3) [PITH_FULL_IMAGE:figures/full_fig_p014_14.png] view at source ↗

read the original abstract

Retrieval-Augmented Generation (RAG) enhances coding tasks by incorporating retrieved code examples into prompts. However, lengthy prompts, often exceeding tens of thousands of tokens, introduce challenges related to limited context windows of language models (LMs) and high computational costs. Existing prompt compression techniques focus on natural language, lacking tailored solutions for code. To address the gap, we propose CodePromptZip, a framework that compresses code examples before integrating into RAG workflows. Our framework employs a type-aware, priority-driven strategy to construct training samples for training code compression model. By using program analysis, we identify token types (e.g., Identifier) and perform ablation analysis to rank their removal priorities based on their impact on task performance. We then train a small LM as the compressor on these samples, enabling flexible compression conditioned on specified ratios while minimizing performance degradation. Specially, the compressor is augmented with a copy mechanism, allowing tokens to be directly copied from the original code snippets. Evaluation results show that CodePromptZip surpasses SOTA entropy-based and distillation-based baselines, improving by 23.4%, 28.7%, and 8.7% over the best baseline for Assertion Generation, Bugs2Fix, and Code Suggestion, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CodePromptZip uses program analysis to rank code token types for removal and adds a copy mechanism to a small compressor LM, claiming gains on three coding tasks, but the experiments leave generalization unclear.

read the letter

The paper's core idea is a compressor for code examples in RAG that tags token types with static analysis, ranks removal order by ablation impact on task accuracy, builds training samples from those ranks, and trains a small LM with an added copy head so it can repeat important tokens from the source. This pipeline is new for code; earlier compression work stayed with natural language and did not use program-analysis-driven priorities or the copy augmentation in this way. It addresses a practical issue where retrieved code snippets make prompts too long for coding tasks like assertion generation and bug repair. The reported improvements over entropy and distillation baselines are the main evidence offered. The soft spot is the experimental grounding. The abstract gives no detail on whether the ablation that produced the priority list used data held out from the final evaluation sets, or whether the same priorities were tested across languages or model sizes. If the ranking was derived from the same task distributions used for testing, the 23-28% gains could reflect per-task fitting rather than a transferable strategy. No statistical tests or baseline reimplementation notes appear either. This is for engineers and researchers who build RAG pipelines for software engineering and need to fit more code examples into fixed contexts. Someone already working on prompt compression or retrieval for code would find the type-aware training data construction worth trying. It deserves peer review because the method is concrete and the results are specific enough to check, even though the paper will need added controls on the priorities and more transparency on the splits.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce CodePromptZip, a framework for compressing code prompts in RAG for coding tasks. It employs program analysis to identify token types, ablation analysis to rank removal priorities based on task performance impact, constructs training samples, and trains a small LM with copy mechanism for flexible compression. Evaluation shows it surpasses SOTA baselines with improvements of 23.4%, 28.7%, and 8.7% on Assertion Generation, Bugs2Fix, and Code Suggestion tasks.

Significance. Should the experimental results prove robust and the compression method generalize across tasks, languages, and models, this could represent a meaningful advance in handling lengthy code contexts for retrieval-augmented generation in software engineering applications, potentially lowering costs and enabling better use of LMs in coding workflows.

major comments (2)

[Abstract] The method's reliance on ablation-derived token removal priorities is load-bearing for the performance claims, yet the description provides no information on whether these ablations were conducted using held-out data or cross-validation separate from the final evaluation tasks. This leaves open the possibility that the reported gains arise from task-specific tuning of the priority list rather than a general code-specific strategy.
[Evaluation results] No details are supplied regarding experimental controls, statistical significance tests, exact baseline implementations, or sensitivity to variations in model size or task distributions, which are necessary to substantiate the central claim of outperforming SOTA methods.

minor comments (1)

Consider adding a dedicated section on limitations, particularly regarding the generalizability of the token-type priorities to new programming languages or downstream models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] The method's reliance on ablation-derived token removal priorities is load-bearing for the performance claims, yet the description provides no information on whether these ablations were conducted using held-out data or cross-validation separate from the final evaluation tasks. This leaves open the possibility that the reported gains arise from task-specific tuning of the priority list rather than a general code-specific strategy.

Authors: We agree the manuscript does not explicitly describe the data split used for ablation analysis. In the revision we will add a dedicated paragraph in the method section stating that ablation studies to derive token-type removal priorities were performed on a held-out validation subset drawn from the training data and kept separate from all test sets used in the final evaluation. We will also report the cross-validation procedure employed to ensure the priority ranking generalizes beyond any single task split. revision: yes
Referee: [Evaluation results] No details are supplied regarding experimental controls, statistical significance tests, exact baseline implementations, or sensitivity to variations in model size or task distributions, which are necessary to substantiate the central claim of outperforming SOTA methods.

Authors: We acknowledge that the current evaluation section lacks these details. The revised manuscript will include: (i) explicit descriptions of experimental controls and the precise re-implementations of all baselines, (ii) statistical significance results (paired t-tests or Wilcoxon signed-rank tests with p-values) across the three tasks, and (iii) sensitivity analyses varying compressor model size and task distribution. These additions will appear in the main evaluation section and an expanded appendix. revision: yes

Circularity Check

0 steps flagged

No circularity: method constructs compressor from program analysis and ablation on training samples, evaluated on held-out tasks.

full rationale

The paper describes a pipeline that first uses program analysis to identify token types, performs ablation to rank removal priorities based on task performance impact, constructs training samples from those priorities, and trains a compressor LM (with copy mechanism) on the resulting samples. The reported gains are measured on separate evaluation tasks (Assertion Generation, Bugs2Fix, Code Suggestion). No equations, fitted parameters, or self-citations are presented that reduce the final performance numbers to quantities defined by construction inside the paper itself. The derivation chain is therefore self-contained against external benchmarks and does not match any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework depends on the assumption that ablation-derived token priorities transfer across tasks and that a small LM can learn to compress at arbitrary ratios while preserving task utility; no new physical entities are postulated.

free parameters (1)

token removal priority ranking
Derived from ablation analysis measuring impact on downstream task performance; used to label training samples.

axioms (1)

domain assumption Program analysis correctly identifies token types and ablation on those types produces stable removal priorities for the target coding tasks.
Invoked when constructing the training samples for the compressor model.

pith-pipeline@v0.9.0 · 5757 in / 1356 out tokens · 62395 ms · 2026-05-23T01:55:05.557782+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 5 internal anchors

[1]

GPT-4 Technical Report

Gpt-4 technical report. arXiv preprint arXiv:2303.08774. Anonymous

work page internal anchor Pith review Pith/arXiv arXiv
[2]

arXiv preprint arXiv:2404.01077

Efficient prompting methods for large language mod- els: A survey. arXiv preprint arXiv:2404.01077. Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, and David Lo

work page arXiv
[3]

Adapting language models to compress contexts

Adapting language models to compress contexts. Preprint, arXiv:2305.14788. Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ra- manathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, et al

work page arXiv
[4]

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi- Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave

Exploring demonstration retrievers in rag for coding tasks: Yeas and nays! Preprint, arXiv:2410.09662. Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi- Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave

work page arXiv
[5]

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression,

Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research, 24(251):1–43. Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2023a. LLMLingua: Com- pressing prompts for accelerated inference of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Languag...

work page arXiv 2023
[6]

Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering

Unlocking context constraints of llms: Enhancing context efficiency of llms with self- information-based content filtering. arXiv preprint arXiv:2304.12102. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al

work page arXiv
[7]

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664. Jesse Mu, Xiang Li, and Noah Goodman

work page internal anchor Pith review Pith/arXiv arXiv
[8]

In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2450–2462

Retrieval-based prompt selection for code-related few-shot learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2450–2462. IEEE. Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Dongmei Zhang

work page 2023
[9]

In Findings of the Association for Computational Linguistics ACL 2024, pages 963– 981, Bangkok, Thailand and virtual meeting

LLMLingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. In Findings of the Association for Computational Linguistics ACL 2024, pages 963– 981, Bangkok, Thailand and virtual meeting. Associ- ation for Computational Linguistics. Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Am...

work page 2024
[10]

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297. Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al

work page internal anchor Pith review Pith/arXiv arXiv 2009
[11]

Code Llama: Open Foundation Models for Code

Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950. Abigail See, Peter J Liu, and Christopher D Man- ning

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805. Yan Wang, Xiaoning Li, Tien N Nguyen, Shaohua Wang, Chao Ni, and Ling Ding

work page internal anchor Pith review Pith/arXiv arXiv
[13]

In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708

Codet5: Identifier-aware unified pre- trained encoder-decoder models for code under- standing and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708. Fangyuan Xu, Weijia Shi, and Eunsol Choi

work page 2021
[14]

A learning-based approach to static program slicing. Proc. ACM Program. Lang., 8(OOPSLA1). Guang Yang, Yu Zhou, Wei Cheng, Xiangyu Zhang, Xi- ang Chen, Terry Zhuo, Ke Liu, Xin Zhou, David Lo, and Taolue Chen. 2024a. Less is more: Docstring compression in code generation. arXiv preprint arXiv:2410.22793. Guang Yang, Yu Zhou, Wei Cheng, Xiangyu Zhang, Xiang...

work page arXiv
[15]

testJustifications

Unifying the perspectives of nlp and software en- gineering: A survey on language models for code. Preprint, arXiv:2311.07989. Demonstrations: [START] ### METHOD_HEADER: {header} ### WHOLE_METHOD: {body} … [END] Query [START] ### METHOD_HEADER: {header} ### WHOLE_METHOD: Demonstrations: [START] ### FOCAL_METHOD: {method under test} ### UNIT_TEST : {test m...

work page arXiv 2025

[1] [1]

GPT-4 Technical Report

Gpt-4 technical report. arXiv preprint arXiv:2303.08774. Anonymous

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

arXiv preprint arXiv:2404.01077

Efficient prompting methods for large language mod- els: A survey. arXiv preprint arXiv:2404.01077. Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, and David Lo

work page arXiv

[3] [3]

Adapting language models to compress contexts

Adapting language models to compress contexts. Preprint, arXiv:2305.14788. Yangruibo Ding, Zijian Wang, Wasi Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ra- manathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, et al

work page arXiv

[4] [4]

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi- Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave

Exploring demonstration retrievers in rag for coding tasks: Yeas and nays! Preprint, arXiv:2410.09662. Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi- Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave

work page arXiv

[5] [5]

Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression,

Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research, 24(251):1–43. Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. 2023a. LLMLingua: Com- pressing prompts for accelerated inference of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Languag...

work page arXiv 2023

[6] [6]

Unlocking context constraints of llms: Enhancing context efficiency of llms with self-information-based content filtering

Unlocking context constraints of llms: Enhancing context efficiency of llms with self- information-based content filtering. arXiv preprint arXiv:2304.12102. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, et al

work page arXiv

[7] [7]

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664. Jesse Mu, Xiang Li, and Noah Goodman

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2450–2462

Retrieval-based prompt selection for code-related few-shot learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2450–2462. IEEE. Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, and Dongmei Zhang

work page 2023

[9] [9]

In Findings of the Association for Computational Linguistics ACL 2024, pages 963– 981, Bangkok, Thailand and virtual meeting

LLMLingua-2: Data distillation for efficient and faithful task-agnostic prompt compression. In Findings of the Association for Computational Linguistics ACL 2024, pages 963– 981, Bangkok, Thailand and virtual meeting. Associ- ation for Computational Linguistics. Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Am...

work page 2024

[10] [10]

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297. Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al

work page internal anchor Pith review Pith/arXiv arXiv 2009

[11] [11]

Code Llama: Open Foundation Models for Code

Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950. Abigail See, Peter J Liu, and Christopher D Man- ning

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805. Yan Wang, Xiaoning Li, Tien N Nguyen, Shaohua Wang, Chao Ni, and Ling Ding

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708

Codet5: Identifier-aware unified pre- trained encoder-decoder models for code under- standing and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708. Fangyuan Xu, Weijia Shi, and Eunsol Choi

work page 2021

[14] [14]

A learning-based approach to static program slicing. Proc. ACM Program. Lang., 8(OOPSLA1). Guang Yang, Yu Zhou, Wei Cheng, Xiangyu Zhang, Xi- ang Chen, Terry Zhuo, Ke Liu, Xin Zhou, David Lo, and Taolue Chen. 2024a. Less is more: Docstring compression in code generation. arXiv preprint arXiv:2410.22793. Guang Yang, Yu Zhou, Wei Cheng, Xiangyu Zhang, Xiang...

work page arXiv

[15] [15]

testJustifications

Unifying the perspectives of nlp and software en- gineering: A survey on language models for code. Preprint, arXiv:2311.07989. Demonstrations: [START] ### METHOD_HEADER: {header} ### WHOLE_METHOD: {body} … [END] Query [START] ### METHOD_HEADER: {header} ### WHOLE_METHOD: Demonstrations: [START] ### FOCAL_METHOD: {method under test} ### UNIT_TEST : {test m...

work page arXiv 2025