Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation
Pith reviewed 2026-05-19 11:48 UTC · model grok-4.3
The pith
Retrieval-augmented code generation transfers knowledge across programming languages unevenly, with success tied to language similarity and pretraining diversity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our experiments reveal three key insights: (1) Knowledge transfer in RACG across PLs is non-trivial even using direct injection. (2) RACG exhibits unequal cross-lingual knowledge transfer, and its efficacy depends on linguistic affinity of PL pair and diversity of LLM pretraining corpus. (3) RACG shows limited reliance on natural language information embedded in code when equipped with a code-specific retriever. These findings provide practical guidance for designing effective multilingual RACG systems.
What carries the argument
A newly constructed dataset of nearly 14K instances across 13 programming languages that enables controlled measurement of cross-lingual knowledge transfer in retrieval-augmented code generation.
If this is right
- Multilingual RACG systems should prioritize language pairs that share structural similarities for higher transfer success.
- Greater diversity in an LLM's pretraining corpus improves cross-lingual code generation performance.
- Code-specific retrievers can be used without heavy dependence on natural language comments inside retrieved snippets.
- Direct injection of cross-language retrieved code offers measurable but limited gains for code migration tasks.
Where Pith is reading between the lines
- Future work could test whether adding targeted pretraining on low-resource programming languages closes the observed transfer gaps.
- Hybrid retrievers that blend code structure with selected natural language signals might yield further gains in languages with richer documentation.
- The results point toward practical design rules for retrieval-augmented tools that developers could apply when porting code between specific language pairs.
Load-bearing premise
The newly constructed dataset of nearly 14K instances across 13 programming languages faithfully represents the distribution and difficulty of real-world cross-lingual code generation and migration tasks that developers encounter.
What would settle it
Running the same RACG experiments on an independently gathered set of real code-migration tasks from open-source repositories and finding that the three reported insights do not appear.
Figures
read the original abstract
Current research on large language models (LLMs) with retrieval-augmented code generation (RACG) has largely focused on single-language settings, leaving their cross-lingual effectiveness underexplored. Multilingual RACG systems are increasingly important for migrating and reusing code across programming languages (PLs), a common yet challenging task in modern software development. To systematically study cross-lingual code knowledge transfer in RACG, we construct a dataset covering 13 PLs with nearly 14K instances. Our experiments reveal three key insights: (1) Knowledge transfer in RACG across PLs is non-trivial even using direct injection. (2) RACG exhibits unequal cross-lingual knowledge transfer, and its efficacy depends on linguistic affinity of PL pair and diversity of LLM pretraining corpus. (3) RACG shows limited reliance on natural language information embedded in code when equipped with a code-specific retriever. These findings provide practical guidance for designing effective multilingual RACG systems. https://github.com/icip-cas/Cross-Lingual-RACG
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper constructs a new dataset of nearly 14K instances across 13 programming languages to empirically study cross-lingual retrieval-augmented code generation (RACG). Experiments on this dataset yield three insights: (1) knowledge transfer across PLs remains non-trivial even with direct injection, (2) transfer efficacy is unequal and depends on linguistic affinity between PL pairs as well as diversity of the LLM pretraining corpus, and (3) RACG exhibits limited reliance on natural language information in code when paired with a code-specific retriever. The work concludes with practical guidance for designing multilingual RACG systems.
Significance. If the central empirical patterns hold, the study provides a valuable contribution by filling a gap in multilingual RACG research, which has been underexplored relative to single-language settings. The newly constructed cross-lingual dataset and the three concrete insights on transfer behaviors offer actionable implications for code migration and reuse tasks. The public release of the dataset and code (via the linked GitHub repository) is a clear strength that supports reproducibility and enables follow-on work.
major comments (2)
- [§3] §3 (Dataset Construction): All three reported insights rest on the newly constructed ~14K-instance dataset. The manuscript provides insufficient detail on the exact filtering rules, pairing criteria for cross-lingual instances, and any external anchoring against real-world migration PRs or production codebases. This leaves open whether the observed unequal transfer and limited NL reliance are general properties of RACG or artifacts of the curation process.
- [§5] §5 (Experimental Results): The second insight on unequal cross-lingual transfer would be strengthened by reporting statistical significance tests (e.g., p-values or confidence intervals) for performance differences across language pairs; without them, it is difficult to rule out that some observed disparities arise from experimental variance rather than systematic linguistic or pretraining effects.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly quantify the dataset scale and language coverage in the opening sentences to improve immediate clarity for readers.
- [Figures/Tables] Figure captions and table headers would benefit from additional detail on the exact metrics plotted (e.g., which retrieval-augmented vs. baseline configurations are compared) to aid quick interpretation.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and constructive feedback. The comments on dataset transparency and statistical rigor are well-taken. We address each point below and will revise the manuscript accordingly to strengthen clarity and evidence.
read point-by-point responses
-
Referee: [§3] §3 (Dataset Construction): All three reported insights rest on the newly constructed ~14K-instance dataset. The manuscript provides insufficient detail on the exact filtering rules, pairing criteria for cross-lingual instances, and any external anchoring against real-world migration PRs or production codebases. This leaves open whether the observed unequal transfer and limited NL reliance are general properties of RACG or artifacts of the curation process.
Authors: We appreciate the referee’s call for greater methodological transparency. Section 3 of the original manuscript already describes the high-level construction pipeline, but we agree it can be expanded. In the revision we will add an explicit subsection detailing: (1) the precise filtering rules (e.g., minimum token length, removal of duplicates via exact and semantic similarity thresholds, and exclusion of trivial or malformed snippets); (2) the pairing criteria for cross-lingual instances, which rely on functional equivalence verified through unit-test execution and semantic similarity computed via code embeddings; and (3) our rationale for the synthetic construction approach, which prioritizes controlled isolation of linguistic factors over direct anchoring to specific GitHub PRs. While we did not perform an exhaustive audit against production migration datasets, the controlled design allows us to isolate the effects of linguistic affinity and pretraining diversity—the core phenomena under study. These additions will help readers evaluate the scope of our claims. revision: yes
-
Referee: [§5] §5 (Experimental Results): The second insight on unequal cross-lingual transfer would be strengthened by reporting statistical significance tests (e.g., p-values or confidence intervals) for performance differences across language pairs; without them, it is difficult to rule out that some observed disparities arise from experimental variance rather than systematic linguistic or pretraining effects.
Authors: We fully agree that statistical tests would increase confidence in the unequal-transfer finding. In the revised Section 5 we will report: (a) 95% confidence intervals around the Pass@1 and Pass@10 metrics for each language pair, and (b) p-values from paired statistical tests (Wilcoxon signed-rank for non-normal distributions and paired t-tests where appropriate) comparing performance differences across pairs. These results will be presented both in the main text and in an expanded appendix table. This addition directly addresses the concern that observed disparities might reflect experimental variance rather than systematic effects of linguistic affinity or pretraining corpus diversity. revision: yes
Circularity Check
Empirical measurements on newly constructed dataset yield independent insights
full rationale
The paper's central claims consist of three empirical insights derived from experiments performed on a freshly constructed dataset of nearly 14K instances spanning 13 programming languages. No mathematical derivation, parameter fitting, or self-referential definition is present; the reported patterns on knowledge transfer, linguistic affinity effects, and retriever behavior are direct outputs of the experimental measurements rather than quantities defined in terms of themselves or prior self-citations. The construction and evaluation steps remain self-contained against external benchmarks because they introduce new data and report observable results without reducing any prediction or uniqueness claim to an input by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The constructed dataset instances accurately reflect realistic cross-lingual code generation scenarios.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We construct a dataset spanning 13 PLs with nearly 14k instances... four experimental settings... Pass@k... adversarial attacks degrade performance
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
RACG exhibits unequal cross-lingual knowledge transfer... depends on linguistic affinity... limited reliance on natural language information
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Evaluating Large Language Models Trained on Code
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374 , 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[2]
Competition- level code generation with alphacode,
Y . Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago et al. , “Competition- level code generation with alphacode,” Science, vol. 378, no. 6624, pp. 1092–1097, 2022
work page 2022
-
[3]
Code Llama: Open Foundation Models for Code
B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y . Adi, J. Liu, R. Sauvestre, T. Remez et al. , “Code llama: Open foundation models for code,” arXiv preprint arXiv:2308.12950 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[4]
Jigsaw: Large language models meet program synthesis,
N. Jain, S. Vaidyanath, A. Iyer, N. Natarajan, S. Parthasarathy, S. Ra- jamani, and R. Sharma, “Jigsaw: Large language models meet program synthesis,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1219–1231
work page 2022
-
[5]
Docprompting: Generating code by retrieving the docs,
S. Zhou, U. Alon, F. F. Xu, Z. Jiang, and G. Neubig, “Docprompting: Generating code by retrieving the docs,” in The Eleventh International Conference on Learning Representations
-
[6]
Repocoder: Repository-level code completion through iterative retrieval and generation,
F. Zhang, B. Chen, Y . Zhang, J. Keung, J. Liu, D. Zan, Y . Mao, J.- G. Lou, and W. Chen, “Repocoder: Repository-level code completion through iterative retrieval and generation,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , 2023, pp. 2471–2484
work page 2023
-
[7]
Swe-bench: Can language models resolve real-world github issues?
C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan, “Swe-bench: Can language models resolve real-world github issues?” in ICLR, 2024
work page 2024
-
[8]
Retrieval- augmented generation for knowledge-intensive nlp tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al. , “Retrieval- augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems , vol. 33, pp. 9459–9474, 2020
work page 2020
-
[9]
Retrieval augmented language model pre-training,
K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. Chang, “Retrieval augmented language model pre-training,” in International conference on machine learning . PMLR, 2020, pp. 3929–3938
work page 2020
-
[10]
Revisiting and improving retrieval-augmented deep assertion generation,
W. Sun, H. Li, M. Yan, Y . Lei, and H. Zhang, “Revisiting and improving retrieval-augmented deep assertion generation,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) . IEEE, 2023, pp. 1123–1135
work page 2023
-
[11]
Droidcoder: Enhanced android code completion with context-enriched retrieval-augmented generation,
X. Yu, C. Li, M. Pan, and X. Li, “Droidcoder: Enhanced android code completion with context-enriched retrieval-augmented generation,” in Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , 2024, pp. 681–693
work page 2024
-
[12]
Rap-gen: Retrieval- augmented patch generation with codet5 for automatic program repair,
W. Wang, Y . Wang, S. Joty, and S. C. Hoi, “Rap-gen: Retrieval- augmented patch generation with codet5 for automatic program repair,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the F oundations of Software Engineering, 2023, pp. 146–158
work page 2023
-
[13]
Evor: Evolving retrieval for code generation,
H. Su, S. Jiang, Y . Lai, H. Wu, B. Shi, C. Liu, Q. Liu, and T. Yu, “Evor: Evolving retrieval for code generation,” in Findings of the Association for Computational Linguistics: EMNLP 2024 , 2024, pp. 2538–2554
work page 2024
-
[14]
Prompt-based code completion via multi-retrieval augmented genera- tion,
H. Tan, Q. Luo, L. Jiang, Z. Zhan, J. Li, H. Zhang, and Y . Zhang, “Prompt-based code completion via multi-retrieval augmented genera- tion,” ACM Transactions on Software Engineering and Methodology , 2024
work page 2024
-
[15]
Rar: Retrieval-augmented retrieval for code generation in low resource lan- guages,
A. Dutta, M. Singh, G. Verbruggen, S. Gulwani, and V . Le, “Rar: Retrieval-augmented retrieval for code generation in low resource lan- guages,” in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , 2024, pp. 21 506–21 515
work page 2024
-
[16]
Improving retrieval-augmented code comment generation by retrieving for generation,
H. Lu and Z. Liu, “Improving retrieval-augmented code comment generation by retrieving for generation,” in 2024 IEEE International Conference on Software Maintenance and Evolution (ICSME) . IEEE, 2024, pp. 350–362
work page 2024
-
[17]
Building a coding assistant via the retrieval-augmented language model,
X. Li, H. Wang, Z. Liu, S. Yu, S. Wang, Y . Yan, Y . Fu, Y . Gu, and G. Yu, “Building a coding assistant via the retrieval-augmented language model,” ACM Transactions on Information Systems , vol. 43, no. 2, pp. 1–25, 2025
work page 2025
-
[18]
An empirical study of retrieval-augmented code generation: Challenges and opportunities,
Z. Yang, S. Chen, C. Gao, Z. Li, X. Hu, K. Liu, and X. Xia, “An empirical study of retrieval-augmented code generation: Challenges and opportunities,” ACM Transactions on Software Engineering and Methodology, 2025
work page 2025
-
[19]
CodeRAG-bench: Can retrieval augment code generation?
Z. Z. Wang, A. Asai, X. V . Yu, F. F. Xu, Y . Xie, G. Neubig, and D. Fried, “CodeRAG-bench: Can retrieval augment code generation?” in Findings of the Association for Computational Linguistics: NAACL 2025 , L. Chiruzzo, A. Ritter, and L. Wang, Eds. Albuquerque, New Mexico: Association for Computational Linguistics, Apr. 2025, pp. 3199–3214. [Online]. Avai...
work page 2025
-
[20]
Preference-guided refactored tuning for retrieval augmented code gen- eration,
X. Gao, Y . Xiong, D. Wang, Z. Guan, Z. Shi, H. Wang, and S. Li, “Preference-guided refactored tuning for retrieval augmented code gen- eration,” in Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering , 2024, pp. 65–77
work page 2024
-
[21]
Retrieval augmented code generation and summarization,
M. R. Parvez, W. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, “Retrieval augmented code generation and summarization,” in Findings of the Association for Computational Linguistics: EMNLP 2021 , 2021, pp. 2719–2734
work page 2021
-
[22]
Multi-language software development: Issues, challenges, and solutions,
H. Yang, Y . Nong, S. Wang, and H. Cai, “Multi-language software development: Issues, challenges, and solutions,” IEEE Transactions on Software Engineering, vol. 50, no. 3, pp. 512–533, 2024
work page 2024
-
[23]
How should we build a benchmark? revisiting 274 code-related benchmarks for llms,
J. Cao, Y .-K. Chan, Z. Ling, W. Wang, S. Li, M. Liu, R. Qiao, Y . Han, C. Wang, B. Yu, P. He, S. Wang, Z. Zheng, M. R. Lyu, and S.-C. Cheung, “How should we build a benchmark? revisiting 274 code-related benchmarks for llms,” 2025. [Online]. Available: https://arxiv.org/abs/2501.10711
-
[24]
Popularity of programming languages,
D. Ður ¯dev, “Popularity of programming languages,” AIDASCO Reviews, vol. 2, no. 2, pp. 24–29, 2024
work page 2024
-
[25]
F. Philippy, S. Guo, and S. Haddadan, “Towards a common under- standing of contributing factors for cross-lingual transfer in multilingual language models: A review,” in The 61st Annual Meeting Of The Association F or Computational Linguistics , 2023
work page 2023
-
[26]
Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks,
N. Chirkova and V . Nikoulina, “Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers) , 2024, pp. 7215–7231
work page 2024
-
[27]
A lightweight polyglot code transformation language,
A. Ketkar, D. Ramos, L. Clapp, R. Barik, and M. K. Ramanathan, “A lightweight polyglot code transformation language,” Proceedings of the ACM on Programming Languages , vol. 8, no. PLDI, pp. 1288–1312, 2024
work page 2024
-
[28]
Scal- able, validated code translation of entire projects using large language models,
H. Zhang, C. David, M. Wang, B. Paulsen, and D. Kroening, “Scal- able, validated code translation of entire projects using large language models,” arXiv preprint arXiv:2412.08035 , 2024
-
[29]
P. Mayer, M. Kirsch, and M. A. Le, “On multi-language software development, cross-language links and accompanying tools: a survey of professional software developers,” Journal of Software Engineering Research and Development , vol. 5, pp. 1–33, 2017
work page 2017
-
[30]
Legacy web application modernization by generating a rest service layer,
R. R. Echeverria, F. Macias, V . M. Pavon, J. M. Conejero, and F. S. Figueroa, “Legacy web application modernization by generating a rest service layer,” IEEE Latin America Transactions , vol. 13, no. 7, pp. 2379–2383, 2015
work page 2015
-
[31]
Challenges in migrating legacy software systems to the cloud—an empirical study,
M. F. Gholami, F. Daneshgar, G. Beydoun, and F. Rabhi, “Challenges in migrating legacy software systems to the cloud—an empirical study,” Information Systems , vol. 67, pp. 100–113, 2017
work page 2017
-
[32]
Knowledge transfer from high-resource to low-resource programming languages for code llms,
F. Cassano, J. Gouwar, F. Lucchetti, C. Schlesinger, A. Freeman, C. J. Anderson, M. Q. Feldman, M. Greenberg, A. Jangda, and A. Guha, “Knowledge transfer from high-resource to low-resource programming languages for code llms,” Proceedings of the ACM on Programming Languages, vol. 8, no. OOPSLA2, pp. 677–708, 2024
work page 2024
-
[33]
Speq: Translation of sparse codes using equivalences,
A. Laird, B. Liu, N. Bjørner, and M. M. Dehnavi, “Speq: Translation of sparse codes using equivalences,” Proceedings of the ACM on Programming Languages, vol. 8, no. PLDI, pp. 1680–1703, 2024
work page 2024
-
[34]
W. Zou, R. Geng, B. Wang, and J. Jia, “Poisonedrag: Knowledge corruption attacks to retrieval-augmented generation of large language models,” arXiv preprint arXiv:2402.07867 , 2024
-
[35]
From allies to adversaries: Manipulating LLM tool-calling through adversarial injection,
R. Zhang, H. Wang, J. Wang, M. Li, Y . Huang, D. Wang, and Q. Wang, “From allies to adversaries: Manipulating LLM tool-calling through adversarial injection,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (V olume 1: Long Papers) , L. Chiruzzo, A. ...
work page 2025
-
[36]
Poisoning web- scale training datasets is practical,
N. Carlini, M. Jagielski, C. A. Choquette-Choo, D. Paleka, W. Pearce, H. Anderson, A. Terzis, K. Thomas, and F. Tramèr, “Poisoning web- scale training datasets is practical,” in2024 IEEE Symposium on Security and Privacy (SP) . IEEE, 2024, pp. 407–425
work page 2024
-
[37]
Anonymous, “Artifact of this paper.” [Online]. Available: https: //anonymous.4open.science/r/Cross-Lingual-RACG-0F3C
-
[38]
Adversarial Robustness of Deep Code Comment Generation,
Y . Zhou, X. Zhang, J. Shen, T. Han, and T. Chen, “Adversarial Robustness of Deep Code Comment Generation,” ACM Transactions on Software Engineering and Methodology , vol. 31, no. 4, pp. 1–30, Oct. 2022
work page 2022
-
[39]
Analyzing apis documentation and code to detect directive defects,
Y . Zhou, R. Gu, T. Chen, Z. Huang, S. Panichella, and H. Gall, “Analyzing apis documentation and code to detect directive defects,” in 2017 IEEE/ACM 39th International Conference on Software Engi- neering (ICSE) . IEEE, 2017, pp. 27–37
work page 2017
-
[40]
Codecleaner: Elevating standards with a robust data contamination mitigation toolkit,
J. Cao, S. Chen, W. Zhang, H. C. Lo, and S.-C. Cheung, “Codecleaner: Elevating standards with a robust data contamination mitigation toolkit,”
-
[41]
Available: https://arxiv.org/abs/2411.10842
[Online]. Available: https://arxiv.org/abs/2411.10842
-
[42]
Software documentation: the practitioners’ perspective,
E. Aghajani, C. Nagy, M. Linares-Vásquez, L. Moreno, G. Bavota, M. Lanza, and D. C. Shepherd, “Software documentation: the practitioners’ perspective,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering , ser. ICSE ’20. New York, NY , USA: Association for Computing Machinery, 2020, p. 590–601. [Online]. Available: https:/...
-
[43]
Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x,
Q. Zheng, X. Xia, X. Zou, Y . Dong, S. Wang, Y . Xue, L. Shen, Z. Wang, A. Wang, Y . Li et al. , “Codegeex: A pre-trained model for code generation with multilingual benchmarking on humaneval-x,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2023, pp. 5673–5684
work page 2023
-
[44]
Multi-lingual evaluation of code generation models,
B. Athiwaratkun, S. K. Gouda, Z. Wang, X. Li, Y . Tian, M. Tan, W. U. Ahmad, S. Wang, Q. Sun, M. Shang et al., “Multi-lingual evaluation of code generation models,” in The Eleventh International Conference on Learning Representations
-
[45]
Mceval: Massively multilingual code evaluation,
L. Chai, S. Liu, J. Yang, Y . Yin, K. Jin, J. Liu, T. Sun, G. Zhang, C. Ren, H. Guo et al. , “Mceval: Massively multilingual code evaluation,” arXiv preprint arXiv:2406.07436, 2024
-
[46]
A survey of automatic generation of source code comments: Algorithms and techniques,
X. Song, H. Sun, X. Wang, and J. Yan, “A survey of automatic generation of source code comments: Algorithms and techniques,” IEEE Access , vol. 7, pp. 111 411–111 428, 2019
work page 2019
-
[47]
Cornstack: High-quality contrastive data for better code ranking,
T. Suresh, R. G. Reddy, Y . Xu, Z. Nussbaum, A. Mulyar, B. Duderstadt, and H. Ji, “Cornstack: High-quality contrastive data for better code ranking,” arXiv preprint arXiv:2412.01007 , 2024
-
[48]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y . Wu, Y . Li et al. , “Deepseek-coder: When the large language model meets programming–the rise of code intelligence,” arXiv preprint arXiv:2401.14196, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[49]
Qwen2.5-Coder Technical Report
B. Hui, J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Lu et al. , “Qwen2. 5-coder technical report,” arXiv preprint arXiv:2409.12186, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[50]
S. Gunasekar, Y . Zhang, J. Aneja, C. C. T. Mendes, A. Del Giorno, S. Gopi, M. Javaheripi, P. Kauffmann, G. de Rosa, O. Saarikivi et al. , “Textbooks are all you need,” arXiv preprint arXiv:2306.11644 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[51]
Textbooks Are All You Need II: phi-1.5 technical report
Y . Li, S. Bubeck, R. Eldan, A. Del Giorno, S. Gunasekar, and Y . T. Lee, “Textbooks are all you need ii: phi-1.5 technical report,” arXiv preprint arXiv:2309.05463, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[52]
J. Chen, S. Chen, J. Cao, J. Shen, and S.-C. Cheung, “When llms meet api documentation: Can retrieval augmentation aid code generation just as it helps developers?” 2025. [Online]. Available: https://arxiv.org/abs/2503.15231
-
[53]
Multipl-e: a scalable and polyglot approach to benchmarking neural code generation,
F. Cassano, J. Gouwar, D. Nguyen, S. Nguyen, L. Phipps-Costin, D. Pinckney, M.-H. Yee, Y . Zi, C. J. Anderson, M. Q. Feldman et al. , “Multipl-e: a scalable and polyglot approach to benchmarking neural code generation,” IEEE Transactions on Software Engineering , vol. 49, no. 7, pp. 3675–3691, 2023
work page 2023
-
[54]
Reacc: A retrieval-augmented code completion framework,
S. Lu, N. Duan, H. Han, D. Guo, S.-w. Hwang, and A. Svyatkovskiy, “Reacc: A retrieval-augmented code completion framework,” in Pro- ceedings of the 60th Annual Meeting of the Association for Computa- tional Linguistics (V olume 1: Long Papers) , 2022, pp. 6227–6240
work page 2022
-
[55]
Large language model-aware in-context learning for code generation,
J. Li, C. Tao, J. Li, G. Li, Z. Jin, H. Zhang, Z. Fang, and F. Liu, “Large language model-aware in-context learning for code generation,” ACM Transactions on Software Engineering and Methodology , 2023
work page 2023
-
[56]
Codegrag: Extracting composed syntax graphs for retrieval augmented cross-lingual code generation,
K. Du, R. Rui, H. Chai, L. Fu, W. Xia, Y . Wang, R. Tang, Y . Yu, and W. Zhang, “Codegrag: Extracting composed syntax graphs for retrieval augmented cross-lingual code generation,” arXiv preprint arXiv:2405.02355, 2024
-
[57]
Cruxeval-x: A benchmark for multilingual code reasoning, understanding and execution, 2025
R. Xu, J. Cao, Y . Lu, H. Lin, X. Han, B. He, S.-C. Cheung, and L. Sun, “Cruxeval-x: A benchmark for multilingual code reasoning, understanding and execution,” arXiv preprint arXiv:2408.13001 , 2024
-
[58]
F. Nazary, Y . Deldjoo, and T. d. Noia, “Poison-rag: Adversarial data poisoning attacks on retrieval-augmented generation in recommender systems,” in European Conference on Information Retrieval . Springer, 2025, pp. 239–251
work page 2025
-
[59]
Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,
J. Xue, M. Zheng, Y . Hu, F. Liu, X. Chen, and Q. Lou, “Badrag: Identifying vulnerabilities in retrieval augmented generation of large language models,” arXiv preprint arXiv:2406.00083 , 2024
-
[60]
Exploring the security threats of knowledge base poisoning in retrieval-augmented code generation,
B. Lin, S. Wang, L. Chen, and X. Mao, “Exploring the security threats of knowledge base poisoning in retrieval-augmented code generation,”
-
[61]
Available: https://arxiv.org/abs/2502.03233
[Online]. Available: https://arxiv.org/abs/2502.03233
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.