Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation

Hongyu Lin; Jialun Cao; Le Sun; Qiming Zhu; Shing-Chi Cheung; WeiLi Zhang; Xianpei Han; Xuanang Chen; Yaojie Lu

arxiv: 2506.03535 · v2 · submitted 2025-06-04 · 💻 cs.SE

Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation

Qiming Zhu , Jialun Cao , Xuanang Chen , WeiLi Zhang , Yaojie Lu , Hongyu Lin , Xianpei Han , Le Sun

show 1 more author

Shing-Chi Cheung

This is my paper

Pith reviewed 2026-05-19 11:48 UTC · model grok-4.3

classification 💻 cs.SE

keywords cross-lingual code generationretrieval-augmented generationprogramming language transfermultilingual LLMscode migrationknowledge transferRACG

0 comments

The pith

Retrieval-augmented code generation transfers knowledge across programming languages unevenly, with success tied to language similarity and pretraining diversity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether retrieval-augmented code generation can move useful code knowledge from one programming language to another without heavy retraining. Researchers built a new dataset of nearly 14,000 examples spanning 13 languages to run controlled tests. Direct injection of retrieved code from a different language produces gains but remains difficult. Transfer performs better between languages that share structural traits and when the underlying model encountered many languages during pretraining. Systems using code-focused retrievers draw little additional value from natural language comments inside the source code.

Core claim

Our experiments reveal three key insights: (1) Knowledge transfer in RACG across PLs is non-trivial even using direct injection. (2) RACG exhibits unequal cross-lingual knowledge transfer, and its efficacy depends on linguistic affinity of PL pair and diversity of LLM pretraining corpus. (3) RACG shows limited reliance on natural language information embedded in code when equipped with a code-specific retriever. These findings provide practical guidance for designing effective multilingual RACG systems.

What carries the argument

A newly constructed dataset of nearly 14K instances across 13 programming languages that enables controlled measurement of cross-lingual knowledge transfer in retrieval-augmented code generation.

If this is right

Multilingual RACG systems should prioritize language pairs that share structural similarities for higher transfer success.
Greater diversity in an LLM's pretraining corpus improves cross-lingual code generation performance.
Code-specific retrievers can be used without heavy dependence on natural language comments inside retrieved snippets.
Direct injection of cross-language retrieved code offers measurable but limited gains for code migration tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future work could test whether adding targeted pretraining on low-resource programming languages closes the observed transfer gaps.
Hybrid retrievers that blend code structure with selected natural language signals might yield further gains in languages with richer documentation.
The results point toward practical design rules for retrieval-augmented tools that developers could apply when porting code between specific language pairs.

Load-bearing premise

The newly constructed dataset of nearly 14K instances across 13 programming languages faithfully represents the distribution and difficulty of real-world cross-lingual code generation and migration tasks that developers encounter.

What would settle it

Running the same RACG experiments on an independently gathered set of real code-migration tasks from open-source repositories and finding that the three reported insights do not appear.

Figures

Figures reproduced from arXiv: 2506.03535 by Hongyu Lin, Jialun Cao, Le Sun, Qiming Zhu, Shing-Chi Cheung, WeiLi Zhang, Xianpei Han, Xuanang Chen, Yaojie Lu.

**Figure 2.** Figure 2: Venn diagrams illustrating the distribution of cases with [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

read the original abstract

Current research on large language models (LLMs) with retrieval-augmented code generation (RACG) has largely focused on single-language settings, leaving their cross-lingual effectiveness underexplored. Multilingual RACG systems are increasingly important for migrating and reusing code across programming languages (PLs), a common yet challenging task in modern software development. To systematically study cross-lingual code knowledge transfer in RACG, we construct a dataset covering 13 PLs with nearly 14K instances. Our experiments reveal three key insights: (1) Knowledge transfer in RACG across PLs is non-trivial even using direct injection. (2) RACG exhibits unequal cross-lingual knowledge transfer, and its efficacy depends on linguistic affinity of PL pair and diversity of LLM pretraining corpus. (3) RACG shows limited reliance on natural language information embedded in code when equipped with a code-specific retriever. These findings provide practical guidance for designing effective multilingual RACG systems. https://github.com/icip-cas/Cross-Lingual-RACG

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a new cross-lingual RACG dataset and clear empirical patterns on transfer that depend on language affinity and pretraining, but the dataset's match to real developer tasks stays unverified.

read the letter

The main point is that this work gives us a fresh dataset across 13 programming languages and shows that retrieval-augmented code generation transfers knowledge unevenly, with success tied to how close the languages are and how diverse the model's pretraining was. Direct injection still leaves gaps, and code-specific retrievers barely use the natural language parts of the code. Those patterns come straight from the experiments on their roughly 14k instances.

Referee Report

2 major / 2 minor

Summary. The paper constructs a new dataset of nearly 14K instances across 13 programming languages to empirically study cross-lingual retrieval-augmented code generation (RACG). Experiments on this dataset yield three insights: (1) knowledge transfer across PLs remains non-trivial even with direct injection, (2) transfer efficacy is unequal and depends on linguistic affinity between PL pairs as well as diversity of the LLM pretraining corpus, and (3) RACG exhibits limited reliance on natural language information in code when paired with a code-specific retriever. The work concludes with practical guidance for designing multilingual RACG systems.

Significance. If the central empirical patterns hold, the study provides a valuable contribution by filling a gap in multilingual RACG research, which has been underexplored relative to single-language settings. The newly constructed cross-lingual dataset and the three concrete insights on transfer behaviors offer actionable implications for code migration and reuse tasks. The public release of the dataset and code (via the linked GitHub repository) is a clear strength that supports reproducibility and enables follow-on work.

major comments (2)

[§3] §3 (Dataset Construction): All three reported insights rest on the newly constructed ~14K-instance dataset. The manuscript provides insufficient detail on the exact filtering rules, pairing criteria for cross-lingual instances, and any external anchoring against real-world migration PRs or production codebases. This leaves open whether the observed unequal transfer and limited NL reliance are general properties of RACG or artifacts of the curation process.
[§5] §5 (Experimental Results): The second insight on unequal cross-lingual transfer would be strengthened by reporting statistical significance tests (e.g., p-values or confidence intervals) for performance differences across language pairs; without them, it is difficult to rule out that some observed disparities arise from experimental variance rather than systematic linguistic or pretraining effects.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly quantify the dataset scale and language coverage in the opening sentences to improve immediate clarity for readers.
[Figures/Tables] Figure captions and table headers would benefit from additional detail on the exact metrics plotted (e.g., which retrieval-augmented vs. baseline configurations are compared) to aid quick interpretation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and constructive feedback. The comments on dataset transparency and statistical rigor are well-taken. We address each point below and will revise the manuscript accordingly to strengthen clarity and evidence.

read point-by-point responses

Referee: [§3] §3 (Dataset Construction): All three reported insights rest on the newly constructed ~14K-instance dataset. The manuscript provides insufficient detail on the exact filtering rules, pairing criteria for cross-lingual instances, and any external anchoring against real-world migration PRs or production codebases. This leaves open whether the observed unequal transfer and limited NL reliance are general properties of RACG or artifacts of the curation process.

Authors: We appreciate the referee’s call for greater methodological transparency. Section 3 of the original manuscript already describes the high-level construction pipeline, but we agree it can be expanded. In the revision we will add an explicit subsection detailing: (1) the precise filtering rules (e.g., minimum token length, removal of duplicates via exact and semantic similarity thresholds, and exclusion of trivial or malformed snippets); (2) the pairing criteria for cross-lingual instances, which rely on functional equivalence verified through unit-test execution and semantic similarity computed via code embeddings; and (3) our rationale for the synthetic construction approach, which prioritizes controlled isolation of linguistic factors over direct anchoring to specific GitHub PRs. While we did not perform an exhaustive audit against production migration datasets, the controlled design allows us to isolate the effects of linguistic affinity and pretraining diversity—the core phenomena under study. These additions will help readers evaluate the scope of our claims. revision: yes
Referee: [§5] §5 (Experimental Results): The second insight on unequal cross-lingual transfer would be strengthened by reporting statistical significance tests (e.g., p-values or confidence intervals) for performance differences across language pairs; without them, it is difficult to rule out that some observed disparities arise from experimental variance rather than systematic linguistic or pretraining effects.

Authors: We fully agree that statistical tests would increase confidence in the unequal-transfer finding. In the revised Section 5 we will report: (a) 95% confidence intervals around the Pass@1 and Pass@10 metrics for each language pair, and (b) p-values from paired statistical tests (Wilcoxon signed-rank for non-normal distributions and paired t-tests where appropriate) comparing performance differences across pairs. These results will be presented both in the main text and in an expanded appendix table. This addition directly addresses the concern that observed disparities might reflect experimental variance rather than systematic effects of linguistic affinity or pretraining corpus diversity. revision: yes

Circularity Check

0 steps flagged

Empirical measurements on newly constructed dataset yield independent insights

full rationale

The paper's central claims consist of three empirical insights derived from experiments performed on a freshly constructed dataset of nearly 14K instances spanning 13 programming languages. No mathematical derivation, parameter fitting, or self-referential definition is present; the reported patterns on knowledge transfer, linguistic affinity effects, and retriever behavior are direct outputs of the experimental measurements rather than quantities defined in terms of themselves or prior self-citations. The construction and evaluation steps remain self-contained against external benchmarks because they introduce new data and report observable results without reducing any prediction or uniqueness claim to an input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on standard empirical assumptions about dataset representativeness and evaluation metrics rather than introducing new free parameters or postulated entities.

axioms (1)

domain assumption The constructed dataset instances accurately reflect realistic cross-lingual code generation scenarios.
The paper uses this dataset to draw conclusions about knowledge transfer; the assumption is invoked when generalizing experimental results to practical multilingual RACG systems.

pith-pipeline@v0.9.0 · 5740 in / 1239 out tokens · 64140 ms · 2026-05-19T11:48:26.172645+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We construct a dataset spanning 13 PLs with nearly 14k instances... four experimental settings... Pass@k... adversarial attacks degrade performance
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RACG exhibits unequal cross-lingual knowledge transfer... depends on linguistic affinity... limited reliance on natural language information

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 6 internal anchors

[1]

Evaluating Large Language Models Trained on Code

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[2]

Competition- level code generation with alphacode,

Y . Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago et al. , “Competition- level code generation with alphacode,” Science, vol. 378, no. 6624, pp. 1092–1097, 2022

work page 2022
[3]

Code Llama: Open Foundation Models for Code

B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, R. Sauvestre, T. Remez et al. , “Code llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Jigsaw: Large language models meet program synthesis,

N. Jain, S. Vaidyanath, A. Iyer, N. Natarajan, S. Parthasarathy, S. Ra- jamani, and R. Sharma, “Jigsaw: Large language models meet program synthesis,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1219–1231

work page 2022
[5]

Docprompting: Generating code by retrieving the docs,

S. Zhou, U. Alon, F. F. Xu, Z. Jiang, and G. Neubig, “Docprompting: Generating code by retrieving the docs,” in The Eleventh International Conference on Learning Representations

work page
[6]

Repocoder: Repository-level code completion through iterative retrieval and generation,

F. Zhang, B. Chen, Y . Zhang, J. Keung, J. Liu, D. Zan, Y . Mao, J.- G. Lou, and W. Chen, “Repocoder: Repository-level code completion through iterative retrieval and generation,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , 2023, pp. 2471–2484

work page 2023
[7]

Swe-bench: Can language models resolve real-world github issues?

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan, “Swe-bench: Can language models resolve real-world github issues?” in ICLR, 2024

work page 2024
[8]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al. , “Retrieval- augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems , vol. 33, pp. 9459–9474, 2020

work page 2020
[9]

Retrieval augmented language model pre-training,

K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. Chang, “Retrieval augmented language model pre-training,” in International conference on machine learning . PMLR, 2020, pp. 3929–3938

work page 2020
[10]

Revisiting and improving retrieval-augmented deep assertion generation,

W. Sun, H. Li, M. Yan, Y . Lei, and H. Zhang, “Revisiting and improving retrieval-augmented deep assertion generation,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) . IEEE, 2023, pp. 1123–1135

work page 2023
[11]

Droidcoder: Enhanced android code completion with context-enriched retrieval-augmented generation,

X. Yu, C. Li, M. Pan, and X. Li, “Droidcoder: Enhanced android code completion with context-enriched retrieval-augmented generation,” in Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , 2024, pp. 681–693

work page 2024
[12]

Rap-gen: Retrieval- augmented patch generation with codet5 for automatic program repair,

W. Wang, Y . Wang, S. Joty, and S. C. Hoi, “Rap-gen: Retrieval- augmented patch generation with codet5 for automatic program repair,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the F oundations of Software Engineering, 2023, pp. 146–158

work page 2023
[13]

Evor: Evolving retrieval for code generation,

H. Su, S. Jiang, Y . Lai, H. Wu, B. Shi, C. Liu, Q. Liu, and T. Yu, “Evor: Evolving retrieval for code generation,” in Findings of the Association for Computational Linguistics: EMNLP 2024 , 2024, pp. 2538–2554

work page 2024
[14]

Prompt-based code completion via multi-retrieval augmented genera- tion,

H. Tan, Q. Luo, L. Jiang, Z. Zhan, J. Li, H. Zhang, and Y . Zhang, “Prompt-based code completion via multi-retrieval augmented genera- tion,” ACM Transactions on Software Engineering and Methodology , 2024

work page 2024
[15]

Rar: Retrieval-augmented retrieval for code generation in low resource lan- guages,

A. Dutta, M. Singh, G. Verbruggen, S. Gulwani, and V . Le, “Rar: Retrieval-augmented retrieval for code generation in low resource lan- guages,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , 2024, pp. 21 506–21 515

work page 2024
[16]

Improving retrieval-augmented code comment generation by retrieving for generation,

H. Lu and Z. Liu, “Improving retrieval-augmented code comment generation by retrieving for generation,” in 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) . IEEE, 2024, pp. 350–362

work page 2024
[17]

Building a coding assistant via the retrieval-augmented language model,

X. Li, H. Wang, Z. Liu, S. Yu, S. Wang, Y . Yan, Y . Fu, Y . Gu, and G. Yu, “Building a coding assistant via the retrieval-augmented language model,” ACM Transactions on Information Systems , vol. 43, no. 2, pp. 1–25, 2025

work page 2025
[18]

An empirical study of retrieval-augmented code generation: Challenges and opportunities,

Z. Yang, S. Chen, C. Gao, Z. Li, X. Hu, K. Liu, and X. Xia, “An empirical study of retrieval-augmented code generation: Challenges and opportunities,” ACM Transactions on Software Engineering and Methodology, 2025

work page 2025
[19]

CodeRAG-bench: Can retrieval augment code generation?

Z. Z. Wang, A. Asai, X. V . Yu, F. F. Xu, Y . Xie, G. Neubig, and D. Fried, “CodeRAG-bench: Can retrieval augment code generation?” in Findings of the Association for Computational Linguistics: NAACL 2025 , L. Chiruzzo, A. Ritter, and L. Wang, Eds. Albuquerque, New Mexico: Association for Computational Linguistics, Apr. 2025, pp. 3199–3214. [Online]. Avai...

work page 2025
[20]

Preference-guided refactored tuning for retrieval augmented code gen- eration,

X. Gao, Y . Xiong, D. Wang, Z. Guan, Z. Shi, H. Wang, and S. Li, “Preference-guided refactored tuning for retrieval augmented code gen- eration,” in Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , 2024, pp. 65–77

work page 2024
[21]

Retrieval augmented code generation and summarization,

M. R. Parvez, W. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, “Retrieval augmented code generation and summarization,” in Findings of the Association for Computational Linguistics: EMNLP 2021 , 2021, pp. 2719–2734

work page 2021
[22]

Multi-language software development: Issues, challenges, and solutions,

H. Yang, Y . Nong, S. Wang, and H. Cai, “Multi-language software development: Issues, challenges, and solutions,” IEEE Transactions on Software Engineering, vol. 50, no. 3, pp. 512–533, 2024

work page 2024
[23]

How should we build a benchmark? revisiting 274 code-related benchmarks for llms,

J. Cao, Y .-K. Chan, Z. Ling, W. Wang, S. Li, M. Liu, R. Qiao, Y . Han, C. Wang, B. Yu, P. He, S. Wang, Z. Zheng, M. R. Lyu, and S.-C. Cheung, “How should we build a benchmark? revisiting 274 code-related benchmarks for llms,” 2025. [Online]. Available: https://arxiv.org/abs/2501.10711

work page arXiv 2025
[24]

Popularity of programming languages,

D. Ður ¯dev, “Popularity of programming languages,” AIDASCO Reviews, vol. 2, no. 2, pp. 24–29, 2024

work page 2024
[25]

Towards a common under- standing of contributing factors for cross-lingual transfer in multilingual language models: A review,

F. Philippy, S. Guo, and S. Haddadan, “Towards a common under- standing of contributing factors for cross-lingual transfer in multilingual language models: A review,” in The 61st Annual Meeting Of The Association F or Computational Linguistics , 2023

work page 2023
[26]

Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks,

N. Chirkova and V . Nikoulina, “Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers) , 2024, pp. 7215–7231

work page 2024
[27]

A lightweight polyglot code transformation language,

A. Ketkar, D. Ramos, L. Clapp, R. Barik, and M. K. Ramanathan, “A lightweight polyglot code transformation language,” Proceedings of the ACM on Programming Languages , vol. 8, no. PLDI, pp. 1288–1312, 2024

work page 2024
[28]

Scal- able, validated code translation of entire projects using large language models,

H. Zhang, C. David, M. Wang, B. Paulsen, and D. Kroening, “Scal- able, validated code translation of entire projects using large language models,” arXiv preprint arXiv:2412.08035 , 2024

work page arXiv 2024
[29]

On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers,

P. Mayer, M. Kirsch, and M. A. Le, “On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers,” Journal of Software Engineering Research and Development , vol. 5, pp. 1–33, 2017

work page 2017
[30]

Legacy web application modernization by generating a rest service layer,

R. R. Echeverria, F. Macias, V . M. Pavon, J. M. Conejero, and F. S. Figueroa, “Legacy web application modernization by generating a rest service layer,” IEEE Latin America Transactions , vol. 13, no. 7, pp. 2379–2383, 2015

work page 2015
[31]

Challenges in migrating legacy software systems to the cloud—an empirical study,

M. F. Gholami, F. Daneshgar, G. Beydoun, and F. Rabhi, “Challenges in migrating legacy software systems to the cloud—an empirical study,” Information Systems , vol. 67, pp. 100–113, 2017

work page 2017
[32]

Knowledge transfer from high-resource to low-resource programming languages for code llms,

F. Cassano, J. Gouwar, F. Lucchetti, C. Schlesinger, A. Freeman, C. J. Anderson, M. Q. Feldman, M. Greenberg, A. Jangda, and A. Guha, “Knowledge transfer from high-resource to low-resource programming languages for code llms,” Proceedings of the ACM on Programming Languages, vol. 8, no. OOPSLA2, pp. 677–708, 2024

work page 2024
[33]

Speq: Translation of sparse codes using equivalences,

A. Laird, B. Liu, N. Bjørner, and M. M. Dehnavi, “Speq: Translation of sparse codes using equivalences,” Proceedings of the ACM on Programming Languages, vol. 8, no. PLDI, pp. 1680–1703, 2024

work page 2024
[34]

Poisonedrag: Knowledge corruption attacks to retrieval- augmented generation of large language models

W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” arXiv preprint arXiv:2402.07867 , 2024

work page arXiv 2024
[35]

From allies to adversaries: Manipulating LLM tool-calling through adversarial injection,

R. Zhang, H. Wang, J. Wang, M. Li, Y . Huang, D. Wang, and Q. Wang, “From allies to adversaries: Manipulating LLM tool-calling through adversarial injection,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers) , L. Chiruzzo, A. ...

work page 2025
[36]

Poisoning web- scale training datasets is practical,

N. Carlini, M. Jagielski, C. A. Choquette-Choo, D. Paleka, W. Pearce, H. Anderson, A. Terzis, K. Thomas, and F. Tramèr, “Poisoning web- scale training datasets is practical,” in2024 IEEE Symposium on Security and Privacy (SP) . IEEE, 2024, pp. 407–425

work page 2024
[37]

Artifact of this paper

Anonymous, “Artifact of this paper.” [Online]. Available: https: //anonymous.4open.science/r/Cross-Lingual-RACG-0F3C

work page
[38]

Adversarial Robustness of Deep Code Comment Generation,

Y . Zhou, X. Zhang, J. Shen, T. Han, and T. Chen, “Adversarial Robustness of Deep Code Comment Generation,” ACM Transactions on Software Engineering and Methodology , vol. 31, no. 4, pp. 1–30, Oct. 2022

work page 2022
[39]

Analyzing apis documentation and code to detect directive defects,

Y . Zhou, R. Gu, T. Chen, Z. Huang, S. Panichella, and H. Gall, “Analyzing apis documentation and code to detect directive defects,” in 2017 IEEE/ACM 39th International Conference on Software Engi- neering (ICSE) . IEEE, 2017, pp. 27–37

work page 2017
[40]

Codecleaner: Elevating standards with a robust data contamination mitigation toolkit,

J. Cao, S. Chen, W. Zhang, H. C. Lo, and S.-C. Cheung, “Codecleaner: Elevating standards with a robust data contamination mitigation toolkit,”

work page
[41]

Available: https://arxiv.org/abs/2411.10842

[Online]. Available: https://arxiv.org/abs/2411.10842

work page arXiv
[42]

Software documentation: the practitioners’ perspective,

E. Aghajani, C. Nagy, M. Linares-Vásquez, L. Moreno, G. Bavota, M. Lanza, and D. C. Shepherd, “Software documentation: the practitioners’ perspective,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering , ser. ICSE ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 590–601. [Online]. Available: https:/...

work page doi:10.1145/3377811.3380405 2020
[43]

Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x,

Q. Zheng, X. Xia, X. Zou, Y . Dong, S. Wang, Y . Xue, L. Shen, Z. Wang, A. Wang, Y . Li et al. , “Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2023, pp. 5673–5684

work page 2023
[44]

Multi-lingual evaluation of code generation models,

B. Athiwaratkun, S. K. Gouda, Z. Wang, X. Li, Y . Tian, M. Tan, W. U. Ahmad, S. Wang, Q. Sun, M. Shang et al., “Multi-lingual evaluation of code generation models,” in The Eleventh International Conference on Learning Representations

work page
[45]

Mceval: Massively multilingual code evaluation,

L. Chai, S. Liu, J. Yang, Y . Yin, K. Jin, J. Liu, T. Sun, G. Zhang, C. Ren, H. Guo et al. , “Mceval: Massively multilingual code evaluation,” arXiv preprint arXiv:2406.07436, 2024

work page arXiv 2024
[46]

A survey of automatic generation of source code comments: Algorithms and techniques,

X. Song, H. Sun, X. Wang, and J. Yan, “A survey of automatic generation of source code comments: Algorithms and techniques,” IEEE Access , vol. 7, pp. 111 411–111 428, 2019

work page 2019
[47]

Cornstack: High-quality contrastive data for better code ranking,

T. Suresh, R. G. Reddy, Y . Xu, Z. Nussbaum, A. Mulyar, B. Duderstadt, and H. Ji, “Cornstack: High-quality contrastive data for better code ranking,” arXiv preprint arXiv:2412.01007 , 2024

work page arXiv 2024
[48]

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . Li et al. , “Deepseek-coder: When the large language model meets programming–the rise of code intelligence,” arXiv preprint arXiv:2401.14196, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[49]

Qwen2.5-Coder Technical Report

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Lu et al. , “Qwen2. 5-coder technical report,” arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[50]

Textbooks Are All You Need

S. Gunasekar, Y . Zhang, J. Aneja, C. C. T. Mendes, A. Del Giorno, S. Gopi, M. Javaheripi, P. Kauffmann, G. de Rosa, O. Saarikivi et al. , “Textbooks are all you need,” arXiv preprint arXiv:2306.11644 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[51]

Textbooks Are All You Need II: phi-1.5 technical report

Y . Li, S. Bubeck, R. Eldan, A. Del Giorno, S. Gunasekar, and Y . T. Lee, “Textbooks are all you need ii: phi-1.5 technical report,” arXiv preprint arXiv:2309.05463, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[52]

When llms meet api documentation: Can retrieval augmentation aid code generation just as it helps developers?

J. Chen, S. Chen, J. Cao, J. Shen, and S.-C. Cheung, “When llms meet api documentation: Can retrieval augmentation aid code generation just as it helps developers?” 2025. [Online]. Available: https://arxiv.org/abs/2503.15231

work page arXiv 2025
[53]

Multipl-e: a scalable and polyglot approach to benchmarking neural code generation,

F. Cassano, J. Gouwar, D. Nguyen, S. Nguyen, L. Phipps-Costin, D. Pinckney, M.-H. Yee, Y . Zi, C. J. Anderson, M. Q. Feldman et al. , “Multipl-e: a scalable and polyglot approach to benchmarking neural code generation,” IEEE Transactions on Software Engineering , vol. 49, no. 7, pp. 3675–3691, 2023

work page 2023
[54]

Reacc: A retrieval-augmented code completion framework,

S. Lu, N. Duan, H. Han, D. Guo, S.-w. Hwang, and A. Svyatkovskiy, “Reacc: A retrieval-augmented code completion framework,” in Pro- ceedings of the 60th Annual Meeting of the Association for Computa- tional Linguistics (V olume 1: Long Papers) , 2022, pp. 6227–6240

work page 2022
[55]

Large language model-aware in-context learning for code generation,

J. Li, C. Tao, J. Li, G. Li, Z. Jin, H. Zhang, Z. Fang, and F. Liu, “Large language model-aware in-context learning for code generation,” ACM Transactions on Software Engineering and Methodology , 2023

work page 2023
[56]

Codegrag: Extracting composed syntax graphs for retrieval augmented cross-lingual code generation,

K. Du, R. Rui, H. Chai, L. Fu, W. Xia, Y . Wang, R. Tang, Y . Yu, and W. Zhang, “Codegrag: Extracting composed syntax graphs for retrieval augmented cross-lingual code generation,” arXiv preprint arXiv:2405.02355, 2024

work page arXiv 2024
[57]

Cruxeval-x: A benchmark for multilingual code reasoning, understanding and execution, 2025

R. Xu, J. Cao, Y . Lu, H. Lin, X. Han, B. He, S.-C. Cheung, and L. Sun, “Cruxeval-x: A benchmark for multilingual code reasoning, understanding and execution,” arXiv preprint arXiv:2408.13001 , 2024

work page arXiv 2024
[58]

Poison-rag: Adversarial data poisoning attacks on retrieval-augmented generation in recommender systems,

F. Nazary, Y . Deldjoo, and T. d. Noia, “Poison-rag: Adversarial data poisoning attacks on retrieval-augmented generation in recommender systems,” in European Conference on Information Retrieval . Springer, 2025, pp. 239–251

work page 2025
[59]

Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,

J. Xue, M. Zheng, Y . Hu, F. Liu, X. Chen, and Q. Lou, “Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,” arXiv preprint arXiv:2406.00083 , 2024

work page arXiv 2024
[60]

Exploring the security threats of knowledge base poisoning in retrieval-augmented code generation,

B. Lin, S. Wang, L. Chen, and X. Mao, “Exploring the security threats of knowledge base poisoning in retrieval-augmented code generation,”

work page
[61]

Available: https://arxiv.org/abs/2502.03233

[Online]. Available: https://arxiv.org/abs/2502.03233

work page arXiv

[1] [1]

Evaluating Large Language Models Trained on Code

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374 , 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[2] [2]

Competition- level code generation with alphacode,

Y . Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago et al. , “Competition- level code generation with alphacode,” Science, vol. 378, no. 6624, pp. 1092–1097, 2022

work page 2022

[3] [3]

Code Llama: Open Foundation Models for Code

B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, R. Sauvestre, T. Remez et al. , “Code llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[4] [4]

Jigsaw: Large language models meet program synthesis,

N. Jain, S. Vaidyanath, A. Iyer, N. Natarajan, S. Parthasarathy, S. Ra- jamani, and R. Sharma, “Jigsaw: Large language models meet program synthesis,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1219–1231

work page 2022

[5] [5]

Docprompting: Generating code by retrieving the docs,

S. Zhou, U. Alon, F. F. Xu, Z. Jiang, and G. Neubig, “Docprompting: Generating code by retrieving the docs,” in The Eleventh International Conference on Learning Representations

work page

[6] [6]

Repocoder: Repository-level code completion through iterative retrieval and generation,

F. Zhang, B. Chen, Y . Zhang, J. Keung, J. Liu, D. Zan, Y . Mao, J.- G. Lou, and W. Chen, “Repocoder: Repository-level code completion through iterative retrieval and generation,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , 2023, pp. 2471–2484

work page 2023

[7] [7]

Swe-bench: Can language models resolve real-world github issues?

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan, “Swe-bench: Can language models resolve real-world github issues?” in ICLR, 2024

work page 2024

[8] [8]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al. , “Retrieval- augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems , vol. 33, pp. 9459–9474, 2020

work page 2020

[9] [9]

Retrieval augmented language model pre-training,

K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. Chang, “Retrieval augmented language model pre-training,” in International conference on machine learning . PMLR, 2020, pp. 3929–3938

work page 2020

[10] [10]

Revisiting and improving retrieval-augmented deep assertion generation,

W. Sun, H. Li, M. Yan, Y . Lei, and H. Zhang, “Revisiting and improving retrieval-augmented deep assertion generation,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) . IEEE, 2023, pp. 1123–1135

work page 2023

[11] [11]

Droidcoder: Enhanced android code completion with context-enriched retrieval-augmented generation,

X. Yu, C. Li, M. Pan, and X. Li, “Droidcoder: Enhanced android code completion with context-enriched retrieval-augmented generation,” in Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , 2024, pp. 681–693

work page 2024

[12] [12]

Rap-gen: Retrieval- augmented patch generation with codet5 for automatic program repair,

W. Wang, Y . Wang, S. Joty, and S. C. Hoi, “Rap-gen: Retrieval- augmented patch generation with codet5 for automatic program repair,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the F oundations of Software Engineering, 2023, pp. 146–158

work page 2023

[13] [13]

Evor: Evolving retrieval for code generation,

H. Su, S. Jiang, Y . Lai, H. Wu, B. Shi, C. Liu, Q. Liu, and T. Yu, “Evor: Evolving retrieval for code generation,” in Findings of the Association for Computational Linguistics: EMNLP 2024 , 2024, pp. 2538–2554

work page 2024

[14] [14]

Prompt-based code completion via multi-retrieval augmented genera- tion,

H. Tan, Q. Luo, L. Jiang, Z. Zhan, J. Li, H. Zhang, and Y . Zhang, “Prompt-based code completion via multi-retrieval augmented genera- tion,” ACM Transactions on Software Engineering and Methodology , 2024

work page 2024

[15] [15]

Rar: Retrieval-augmented retrieval for code generation in low resource lan- guages,

A. Dutta, M. Singh, G. Verbruggen, S. Gulwani, and V . Le, “Rar: Retrieval-augmented retrieval for code generation in low resource lan- guages,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , 2024, pp. 21 506–21 515

work page 2024

[16] [16]

Improving retrieval-augmented code comment generation by retrieving for generation,

H. Lu and Z. Liu, “Improving retrieval-augmented code comment generation by retrieving for generation,” in 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) . IEEE, 2024, pp. 350–362

work page 2024

[17] [17]

Building a coding assistant via the retrieval-augmented language model,

X. Li, H. Wang, Z. Liu, S. Yu, S. Wang, Y . Yan, Y . Fu, Y . Gu, and G. Yu, “Building a coding assistant via the retrieval-augmented language model,” ACM Transactions on Information Systems , vol. 43, no. 2, pp. 1–25, 2025

work page 2025

[18] [18]

An empirical study of retrieval-augmented code generation: Challenges and opportunities,

Z. Yang, S. Chen, C. Gao, Z. Li, X. Hu, K. Liu, and X. Xia, “An empirical study of retrieval-augmented code generation: Challenges and opportunities,” ACM Transactions on Software Engineering and Methodology, 2025

work page 2025

[19] [19]

CodeRAG-bench: Can retrieval augment code generation?

Z. Z. Wang, A. Asai, X. V . Yu, F. F. Xu, Y . Xie, G. Neubig, and D. Fried, “CodeRAG-bench: Can retrieval augment code generation?” in Findings of the Association for Computational Linguistics: NAACL 2025 , L. Chiruzzo, A. Ritter, and L. Wang, Eds. Albuquerque, New Mexico: Association for Computational Linguistics, Apr. 2025, pp. 3199–3214. [Online]. Avai...

work page 2025

[20] [20]

Preference-guided refactored tuning for retrieval augmented code gen- eration,

X. Gao, Y . Xiong, D. Wang, Z. Guan, Z. Shi, H. Wang, and S. Li, “Preference-guided refactored tuning for retrieval augmented code gen- eration,” in Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , 2024, pp. 65–77

work page 2024

[21] [21]

Retrieval augmented code generation and summarization,

M. R. Parvez, W. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, “Retrieval augmented code generation and summarization,” in Findings of the Association for Computational Linguistics: EMNLP 2021 , 2021, pp. 2719–2734

work page 2021

[22] [22]

Multi-language software development: Issues, challenges, and solutions,

H. Yang, Y . Nong, S. Wang, and H. Cai, “Multi-language software development: Issues, challenges, and solutions,” IEEE Transactions on Software Engineering, vol. 50, no. 3, pp. 512–533, 2024

work page 2024

[23] [23]

How should we build a benchmark? revisiting 274 code-related benchmarks for llms,

J. Cao, Y .-K. Chan, Z. Ling, W. Wang, S. Li, M. Liu, R. Qiao, Y . Han, C. Wang, B. Yu, P. He, S. Wang, Z. Zheng, M. R. Lyu, and S.-C. Cheung, “How should we build a benchmark? revisiting 274 code-related benchmarks for llms,” 2025. [Online]. Available: https://arxiv.org/abs/2501.10711

work page arXiv 2025

[24] [24]

Popularity of programming languages,

D. Ður ¯dev, “Popularity of programming languages,” AIDASCO Reviews, vol. 2, no. 2, pp. 24–29, 2024

work page 2024

[25] [25]

Towards a common under- standing of contributing factors for cross-lingual transfer in multilingual language models: A review,

F. Philippy, S. Guo, and S. Haddadan, “Towards a common under- standing of contributing factors for cross-lingual transfer in multilingual language models: A review,” in The 61st Annual Meeting Of The Association F or Computational Linguistics , 2023

work page 2023

[26] [26]

Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks,

N. Chirkova and V . Nikoulina, “Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers) , 2024, pp. 7215–7231

work page 2024

[27] [27]

A lightweight polyglot code transformation language,

A. Ketkar, D. Ramos, L. Clapp, R. Barik, and M. K. Ramanathan, “A lightweight polyglot code transformation language,” Proceedings of the ACM on Programming Languages , vol. 8, no. PLDI, pp. 1288–1312, 2024

work page 2024

[28] [28]

Scal- able, validated code translation of entire projects using large language models,

H. Zhang, C. David, M. Wang, B. Paulsen, and D. Kroening, “Scal- able, validated code translation of entire projects using large language models,” arXiv preprint arXiv:2412.08035 , 2024

work page arXiv 2024

[29] [29]

On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers,

P. Mayer, M. Kirsch, and M. A. Le, “On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers,” Journal of Software Engineering Research and Development , vol. 5, pp. 1–33, 2017

work page 2017

[30] [30]

Legacy web application modernization by generating a rest service layer,

R. R. Echeverria, F. Macias, V . M. Pavon, J. M. Conejero, and F. S. Figueroa, “Legacy web application modernization by generating a rest service layer,” IEEE Latin America Transactions , vol. 13, no. 7, pp. 2379–2383, 2015

work page 2015

[31] [31]

Challenges in migrating legacy software systems to the cloud—an empirical study,

M. F. Gholami, F. Daneshgar, G. Beydoun, and F. Rabhi, “Challenges in migrating legacy software systems to the cloud—an empirical study,” Information Systems , vol. 67, pp. 100–113, 2017

work page 2017

[32] [32]

Knowledge transfer from high-resource to low-resource programming languages for code llms,

F. Cassano, J. Gouwar, F. Lucchetti, C. Schlesinger, A. Freeman, C. J. Anderson, M. Q. Feldman, M. Greenberg, A. Jangda, and A. Guha, “Knowledge transfer from high-resource to low-resource programming languages for code llms,” Proceedings of the ACM on Programming Languages, vol. 8, no. OOPSLA2, pp. 677–708, 2024

work page 2024

[33] [33]

Speq: Translation of sparse codes using equivalences,

A. Laird, B. Liu, N. Bjørner, and M. M. Dehnavi, “Speq: Translation of sparse codes using equivalences,” Proceedings of the ACM on Programming Languages, vol. 8, no. PLDI, pp. 1680–1703, 2024

work page 2024

[34] [34]

Poisonedrag: Knowledge corruption attacks to retrieval- augmented generation of large language models

W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” arXiv preprint arXiv:2402.07867 , 2024

work page arXiv 2024

[35] [35]

From allies to adversaries: Manipulating LLM tool-calling through adversarial injection,

R. Zhang, H. Wang, J. Wang, M. Li, Y . Huang, D. Wang, and Q. Wang, “From allies to adversaries: Manipulating LLM tool-calling through adversarial injection,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers) , L. Chiruzzo, A. ...

work page 2025

[36] [36]

Poisoning web- scale training datasets is practical,

N. Carlini, M. Jagielski, C. A. Choquette-Choo, D. Paleka, W. Pearce, H. Anderson, A. Terzis, K. Thomas, and F. Tramèr, “Poisoning web- scale training datasets is practical,” in2024 IEEE Symposium on Security and Privacy (SP) . IEEE, 2024, pp. 407–425

work page 2024

[37] [37]

Artifact of this paper

Anonymous, “Artifact of this paper.” [Online]. Available: https: //anonymous.4open.science/r/Cross-Lingual-RACG-0F3C

work page

[38] [38]

Adversarial Robustness of Deep Code Comment Generation,

Y . Zhou, X. Zhang, J. Shen, T. Han, and T. Chen, “Adversarial Robustness of Deep Code Comment Generation,” ACM Transactions on Software Engineering and Methodology , vol. 31, no. 4, pp. 1–30, Oct. 2022

work page 2022

[39] [39]

Analyzing apis documentation and code to detect directive defects,

Y . Zhou, R. Gu, T. Chen, Z. Huang, S. Panichella, and H. Gall, “Analyzing apis documentation and code to detect directive defects,” in 2017 IEEE/ACM 39th International Conference on Software Engi- neering (ICSE) . IEEE, 2017, pp. 27–37

work page 2017

[40] [40]

Codecleaner: Elevating standards with a robust data contamination mitigation toolkit,

J. Cao, S. Chen, W. Zhang, H. C. Lo, and S.-C. Cheung, “Codecleaner: Elevating standards with a robust data contamination mitigation toolkit,”

work page

[41] [41]

Available: https://arxiv.org/abs/2411.10842

[Online]. Available: https://arxiv.org/abs/2411.10842

work page arXiv

[42] [42]

Software documentation: the practitioners’ perspective,

E. Aghajani, C. Nagy, M. Linares-Vásquez, L. Moreno, G. Bavota, M. Lanza, and D. C. Shepherd, “Software documentation: the practitioners’ perspective,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering , ser. ICSE ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 590–601. [Online]. Available: https:/...

work page doi:10.1145/3377811.3380405 2020

[43] [43]

Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x,

Q. Zheng, X. Xia, X. Zou, Y . Dong, S. Wang, Y . Xue, L. Shen, Z. Wang, A. Wang, Y . Li et al. , “Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2023, pp. 5673–5684

work page 2023

[44] [44]

Multi-lingual evaluation of code generation models,

B. Athiwaratkun, S. K. Gouda, Z. Wang, X. Li, Y . Tian, M. Tan, W. U. Ahmad, S. Wang, Q. Sun, M. Shang et al., “Multi-lingual evaluation of code generation models,” in The Eleventh International Conference on Learning Representations

work page

[45] [45]

Mceval: Massively multilingual code evaluation,

L. Chai, S. Liu, J. Yang, Y . Yin, K. Jin, J. Liu, T. Sun, G. Zhang, C. Ren, H. Guo et al. , “Mceval: Massively multilingual code evaluation,” arXiv preprint arXiv:2406.07436, 2024

work page arXiv 2024

[46] [46]

A survey of automatic generation of source code comments: Algorithms and techniques,

X. Song, H. Sun, X. Wang, and J. Yan, “A survey of automatic generation of source code comments: Algorithms and techniques,” IEEE Access , vol. 7, pp. 111 411–111 428, 2019

work page 2019

[47] [47]

Cornstack: High-quality contrastive data for better code ranking,

T. Suresh, R. G. Reddy, Y . Xu, Z. Nussbaum, A. Mulyar, B. Duderstadt, and H. Ji, “Cornstack: High-quality contrastive data for better code ranking,” arXiv preprint arXiv:2412.01007 , 2024

work page arXiv 2024

[48] [48]

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . Li et al. , “Deepseek-coder: When the large language model meets programming–the rise of code intelligence,” arXiv preprint arXiv:2401.14196, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[49] [49]

Qwen2.5-Coder Technical Report

B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Lu et al. , “Qwen2. 5-coder technical report,” arXiv preprint arXiv:2409.12186, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[50] [50]

Textbooks Are All You Need

S. Gunasekar, Y . Zhang, J. Aneja, C. C. T. Mendes, A. Del Giorno, S. Gopi, M. Javaheripi, P. Kauffmann, G. de Rosa, O. Saarikivi et al. , “Textbooks are all you need,” arXiv preprint arXiv:2306.11644 , 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[51] [51]

Textbooks Are All You Need II: phi-1.5 technical report

Y . Li, S. Bubeck, R. Eldan, A. Del Giorno, S. Gunasekar, and Y . T. Lee, “Textbooks are all you need ii: phi-1.5 technical report,” arXiv preprint arXiv:2309.05463, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[52] [52]

When llms meet api documentation: Can retrieval augmentation aid code generation just as it helps developers?

J. Chen, S. Chen, J. Cao, J. Shen, and S.-C. Cheung, “When llms meet api documentation: Can retrieval augmentation aid code generation just as it helps developers?” 2025. [Online]. Available: https://arxiv.org/abs/2503.15231

work page arXiv 2025

[53] [53]

Multipl-e: a scalable and polyglot approach to benchmarking neural code generation,

F. Cassano, J. Gouwar, D. Nguyen, S. Nguyen, L. Phipps-Costin, D. Pinckney, M.-H. Yee, Y . Zi, C. J. Anderson, M. Q. Feldman et al. , “Multipl-e: a scalable and polyglot approach to benchmarking neural code generation,” IEEE Transactions on Software Engineering , vol. 49, no. 7, pp. 3675–3691, 2023

work page 2023

[54] [54]

Reacc: A retrieval-augmented code completion framework,

S. Lu, N. Duan, H. Han, D. Guo, S.-w. Hwang, and A. Svyatkovskiy, “Reacc: A retrieval-augmented code completion framework,” in Pro- ceedings of the 60th Annual Meeting of the Association for Computa- tional Linguistics (V olume 1: Long Papers) , 2022, pp. 6227–6240

work page 2022

[55] [55]

Large language model-aware in-context learning for code generation,

J. Li, C. Tao, J. Li, G. Li, Z. Jin, H. Zhang, Z. Fang, and F. Liu, “Large language model-aware in-context learning for code generation,” ACM Transactions on Software Engineering and Methodology , 2023

work page 2023

[56] [56]

Codegrag: Extracting composed syntax graphs for retrieval augmented cross-lingual code generation,

K. Du, R. Rui, H. Chai, L. Fu, W. Xia, Y . Wang, R. Tang, Y . Yu, and W. Zhang, “Codegrag: Extracting composed syntax graphs for retrieval augmented cross-lingual code generation,” arXiv preprint arXiv:2405.02355, 2024

work page arXiv 2024

[57] [57]

Cruxeval-x: A benchmark for multilingual code reasoning, understanding and execution, 2025

R. Xu, J. Cao, Y . Lu, H. Lin, X. Han, B. He, S.-C. Cheung, and L. Sun, “Cruxeval-x: A benchmark for multilingual code reasoning, understanding and execution,” arXiv preprint arXiv:2408.13001 , 2024

work page arXiv 2024

[58] [58]

Poison-rag: Adversarial data poisoning attacks on retrieval-augmented generation in recommender systems,

F. Nazary, Y . Deldjoo, and T. d. Noia, “Poison-rag: Adversarial data poisoning attacks on retrieval-augmented generation in recommender systems,” in European Conference on Information Retrieval . Springer, 2025, pp. 239–251

work page 2025

[59] [59]

Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,

J. Xue, M. Zheng, Y . Hu, F. Liu, X. Chen, and Q. Lou, “Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,” arXiv preprint arXiv:2406.00083 , 2024

work page arXiv 2024

[60] [60]

Exploring the security threats of knowledge base poisoning in retrieval-augmented code generation,

B. Lin, S. Wang, L. Chen, and X. Mao, “Exploring the security threats of knowledge base poisoning in retrieval-augmented code generation,”

work page

[61] [61]

Available: https://arxiv.org/abs/2502.03233

[Online]. Available: https://arxiv.org/abs/2502.03233

work page arXiv