pith. sign in

arxiv: 2605.16046 · v1 · pith:SIM5TJCMnew · submitted 2026-05-15 · 💻 cs.SE · cs.AI

XSearch: Explainable Code Search via Concept-to-Code Alignment

Pith reviewed 2026-05-20 16:26 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords code searchexplainable retrievalconcept alignmentsemantic searchout-of-distribution generalizationcode retrievalnatural language to code
0
0 comments X

The pith

Reformulating code search as explicit concept-to-code alignment yields better out-of-distribution performance and built-in explanations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current semantic code search relies on embedding similarity between queries and code, which often fails to capture true functional needs and degrades sharply on new data distributions. XSearch instead extracts functional concepts from the natural language query and aligns them directly to specific statements inside candidate code snippets. This turns retrieval into a deductive matching process rather than an inductive similarity guess, producing explanations by design. The resulting model, trained on standard data, reaches substantially higher accuracy on benchmarks that differ from its training distribution and helps users judge results more quickly.

Core claim

XSearch reformulates code search as a deductive concept alignment problem: it identifies functional concepts in the query and explicitly aligns them with corresponding code statements through an encoder trained with concept-alignment objectives, replacing global embedding similarity with explicit matching to produce inherent explanations and reduce shortcut learning that harms generalization.

What carries the argument

Concept-to-code alignment, which extracts functional concepts from queries and matches them to code statements instead of using vector similarity.

If this is right

  • Performance on out-of-distribution benchmarks rises from 0.02 to 0.33, a 15-fold improvement over eight prior retrievers.
  • The method outperforms both encoder and decoder baselines that use up to 7 billion parameters.
  • Concept-alignment explanations allow users to evaluate retrieved code faster and more accurately in studies.
  • The explain-then-predict structure reduces reliance on statistical shortcuts that hurt generalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alignment idea could be tested on related retrieval problems such as API recommendation or bug localization.
  • Explicit concept matches might support interactive refinement where users correct or add concepts during search.
  • Hybrid systems could combine the alignments with generative models to produce or edit code that satisfies the identified concepts.

Load-bearing premise

Functional concepts can be reliably identified from natural-language queries and that aligning these concepts to code statements captures the query's true functional requirements without shortcut learning.

What would settle it

Construct a new test set of queries whose functional requirements depend on interactions among multiple code statements that do not map cleanly to single identifiable concepts, then measure whether the reported performance gains over baselines disappear.

Figures

Figures reproduced from arXiv: 2605.16046 by Linpeng Huang, Pengnian Qi, Qianxiang Wang, Ruofan Liu, Weinan Zhang, Weiyu Kong, Xiao Cheng, Yiming Liu, Yun Lin, Zicong Zhang.

Figure 1
Figure 1. Figure 1: A CodeBERT retriever trained on CodeSearchNet exhibits shortcut learning, relying on leading tokens [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: XSearch Model Architecture. During training (LHS), concept-aligned query and code token spans form positive pairs, while non-aligned spans form negative pairs. A shared encoder maps query and code tokens to contextual embeddings. For code tokens, AST type embeddings are added to incorporate structural information. A linear probing head predicts concept-bearing tokens at the token level, and an alignment ob… view at source ↗
Figure 3
Figure 3. Figure 3: LLM-assisted Annotation Pipeline. • Stage 1: Concept Label Augmentation (Section 3.2). We annotate essential concepts in queries and identify the corresponding code units that implement each concept. This stage produces token-level query-code alignment annotations. • Stage 2: Alignment-Aware Model Training (Section 3.3). Using the annotated alignments, we train an alignment-aware retrieval model that joint… view at source ↗
Figure 4
Figure 4. Figure 4: Hyperparameter sweep on the CodeSearchNet-python [ [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Failure Case of CoCoSoDa [69]. Baseline Qwen2.5-Coder-7B Prediction Query python break up a string into dictionaries Retrieved Top 1 Code (Score = 0.66) for word in string.split(' '): print(word, end=' ') Ground Truth Code (Rank > 5) def string_to_dict(string): list_of_entries = string.split(',') list_of_split_entries = map(lambda e: e.split('=‘), list_of_entries) return dict(list_of_split_entries) XSearch… view at source ↗
Figure 6
Figure 6. Figure 6: Failure Case of Qwen2.5-Coder-7B-lt-SupCon-CSN [ [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Failure Case of XSearch. Left: Retrieved Top-1 code for original query. Middle: Ground-truth code for original query, where the query includes an extra concept (“relevant SQL info”) not reflected in the code. Right: Refined query (replacing the extra concept with “relevant query AST”) and the ground-truth code. retrieval, where a sequence is represented by the average over all token embeddings [10]. Under … view at source ↗
Figure 8
Figure 8. Figure 8: Alignment and Highlight Performance Across Languages. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The query asks for an in-place merge of two dictionaries and overwrites existing keys in the base [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗
read the original abstract

Semantic code search has been widely adopted in both academia and industry. These approaches embed natural-language queries and code snippets into a shared embedding space and retrieve results based on vector similarity. Despit strong performance on benchmark datasets, they often suffer from poor explainability and generalization. Retrieved code may appear semantically similar yet miss critical functional requirements of the query, while providing no explanation of why the result was retrieved. Moreover, such failures become more severe under distribution shift, where models struggle to generalize to unseen benchmarks. In this work, we propose XSearch, an intrinsically explainable code search framework. Our key insight is that by relying on global embedding similarity, existing retrievers inherently take an inductive view. They learn statistical patterns rather than truly understanding the query's functional requirements. We address this problem by reformulating code search as a deductive concept alignment problem. XSearch (i) identifies functional concepts in the query and (ii) explicitly aligns them with corresponding code statements. This explain-then-predict design produces inherent concept-level explanations and mitigates shortcut learning that harms out-of-distribution generalization. We train an encoder with explicit concept-alignment objectives and perform retrieval through explicit matching between query concepts and code statements. Experiments show that, trained on CodeSearchNet using GraphCodeBERT (125M parameters), XSearch improves performance on out-of-distribution benchmarks from 0.02 to 0.33 (15x) over eight state-of-the-art retrievers, and consistently outperforms both encoder- and decoder-based baselines with up to 7B parameters. A user study demonstrates that concept-alignment explanations enable users to evaluate retrieved results faster and more accurately.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes XSearch, an intrinsically explainable code search framework that reformulates semantic code search as a deductive concept alignment task. It identifies functional concepts from natural-language queries and explicitly aligns them to corresponding code statements via an encoder trained with concept-alignment objectives on CodeSearchNet using GraphCodeBERT. The central claims are a 15× improvement on out-of-distribution benchmarks (0.02 to 0.33) over eight state-of-the-art retrievers, consistent outperformance of encoder- and decoder-based baselines up to 7B parameters, and a user study showing that concept-alignment explanations improve evaluation speed and accuracy.

Significance. If the reported OOD gains and attribution to the deductive reformulation hold after verification, the work would be significant for addressing shortcut learning and explainability deficits in neural code search. The explicit concept-to-statement matching offers a concrete mechanism for inherent explanations, which is a strength relative to post-hoc interpretability methods in the field.

major comments (2)
  1. [Abstract and experimental evaluation] The central OOD generalization claim (0.02 → 0.33) is load-bearing for the paper's contribution, yet the abstract and experimental sections provide no error bars, ablation studies on the alignment loss, or details on the concept identification procedure. Without these, it is impossible to determine whether gains arise from the claimed deductive reformulation or from auxiliary training choices.
  2. [§3 (method overview)] The argument that explicit concept alignment mitigates shortcut learning rests on the premise that step (i)—identifying functional concepts from the query—is reliable and non-inductive. If this step is performed by a learned component trained on CodeSearchNet-style data, the approach reduces to an enhanced inductive model; this distinction is not resolved by the current description and directly affects the validity of the inductive-vs-deductive framing.
minor comments (2)
  1. [§3.1] Clarify the precise definition and extraction rules for 'functional concepts' with a formal notation or running example to aid reproducibility.
  2. [User study] The user study would benefit from reporting participant count, task design, and statistical tests for the observed improvements in speed and accuracy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major comment below and indicate the revisions we will make to strengthen the presentation and clarify key aspects of the work.

read point-by-point responses
  1. Referee: [Abstract and experimental evaluation] The central OOD generalization claim (0.02 → 0.33) is load-bearing for the paper's contribution, yet the abstract and experimental sections provide no error bars, ablation studies on the alignment loss, or details on the concept identification procedure. Without these, it is impossible to determine whether gains arise from the claimed deductive reformulation or from auxiliary training choices.

    Authors: We agree that the current presentation would be strengthened by additional supporting details for the OOD results. In the revised manuscript we will add error bars from multiple random seeds to the reported metrics, include ablation experiments that isolate the contribution of the concept-alignment objectives, and expand the experimental section with a precise description of the concept identification procedure. These changes will make it easier to attribute performance gains to the proposed reformulation. revision: yes

  2. Referee: [§3 (method overview)] The argument that explicit concept alignment mitigates shortcut learning rests on the premise that step (i)—identifying functional concepts from the query—is reliable and non-inductive. If this step is performed by a learned component trained on CodeSearchNet-style data, the approach reduces to an enhanced inductive model; this distinction is not resolved by the current description and directly affects the validity of the inductive-vs-deductive framing.

    Authors: We acknowledge that the current description in §3 does not fully resolve the inductive-versus-deductive distinction for the concept identification step. In the revision we will expand the method overview to explicitly state how functional concepts are extracted from the query (including any reliance on learned versus rule-based components) and to articulate why the overall pipeline still qualifies as deductive relative to standard embedding-based retrievers. This clarification will directly address the validity of the framing. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains presented as independent of training objective by construction

full rationale

The paper describes training an encoder on CodeSearchNet with explicit concept-alignment objectives and reports OOD retrieval gains as measured outcomes on held-out benchmarks. No equations, fitting procedures, or self-citations are shown that would make the 0.02-to-0.33 improvement definitionally equivalent to the alignment loss or to any prior result by the same authors. The inductive-versus-deductive framing is advanced as a modeling choice whose validity is tested externally rather than presupposed by the training recipe itself. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that queries contain identifiable functional concepts whose explicit alignment to code statements yields better generalization than global embedding similarity; no free parameters or invented entities with independent evidence are described in the abstract.

axioms (1)
  • domain assumption Functional concepts present in natural-language queries can be identified and aligned to corresponding code statements in a manner that captures the query's functional requirements.
    This premise underpins the explain-then-predict design and the claimed mitigation of shortcut learning.
invented entities (1)
  • functional concepts no independent evidence
    purpose: To decompose queries for explicit matching to code statements.
    Introduced as the core representational unit of the framework; no independent evidence outside the paper is mentioned.

pith-pipeline@v0.9.0 · 5853 in / 1158 out tokens · 74399 ms · 2026-05-20T16:26:44.134997+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

93 extracted references · 93 canonical work pages · 13 internal anchors

  1. [1]

    Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization.arXiv preprint arXiv:2005.00653(2020)

  2. [2]

    Anonymous. 2025. XSearch. https://sites.google.com/view/xai-search/home Accessed: 2025-03-15

  3. [3]

    Suborno Deb Bappon, Saikat Mondal, and Banani Roy. 2024. AUTOGENICS: Automated Generation of Context-Aware Inline Comments for Code Snippets on Programming Q&A Sites Using LLM. In2024 IEEE International Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 24–35

  4. [4]

    Kaj Bostrom, Harsh Jhamtani, Hao Fang, Sam Thomson, Richard Shin, Patrick Xia, Benjamin Van Durme, Jason Eisner, and Jacob Andreas. 2024. Language-to-Code Translation with a Single Labeled Example. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 8101–8112

  5. [5]

    Wing-Kwan Chan, Hong Cheng, and David Lo. 2012. Searching connected API subgraph via text phrases. InProceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering. 1–11

  6. [6]

    Gong Chen, Xiaoyuan Xie, Daniel Tang, Qi Xin, and Wenjie Liu. 2024. HedgeCode: A Multi-Task Hedging Contrastive Learning Framework for Code Search. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). IEEE Computer Society, 89–100

  7. [7]

    Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, and David Lo. 2024. Code Search is All You Need? Improving Code Suggestions with Code Search. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. 1–13

  8. [8]

    Mouxiang Chen, Hao Tian, Zhongxin Liu, Xiaoxue Ren, and Jianling Sun. 2024. Jumpcoder: Go beyond autoregressive coder via online modification.arXiv preprint arXiv:2401.07870(2024)

  9. [9]

    Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation.Advances in neural information processing systems31 (2018)

  10. [10]

    Yuxuan Chen, Guangsheng Ou, Mingwei Liu, Yanlin Wang, and Zibin Zheng. 2024. Are Decoder-Only Large Language Models the Silver Bullet for Code Search?arXiv preprint arXiv:2410.22240(2024)

  11. [11]

    Zhenlong Dai, Chang Yao, WenKang Han, Ying Yuan, Zhipeng Gao, and Jingyuan Chen. 2024. Mpcoder: Multi-user personalized code generator with explicit and implicit style representation learning.arXiv preprint arXiv:2406.17255 (2024)

  12. [12]

    Luca Di Grazia and Michael Pradel. 2023. Code search: A survey of techniques for finding code.Comput. Surveys55, 11 (2023), 1–31

  13. [13]

    Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav Golubev, Danny Dig, and Timofey Bryksin. 2023. From commit message generation to history-aware commit message completion. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 723–735

  14. [14]

    Guodong Fan, Shizhan Chen, Cuiyun Gao, Jianmao Xiao, Tao Zhang, and Zhiyong Feng. 2024. Rapid: Zero-shot domain adaptation for code search with pre-trained models.ACM Transactions on Software Engineering and Methodology33, 5 (2024), 1–35

  15. [15]

    Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469–1481

  16. [16]

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al . 2020. Codebert: A pre-trained model for programming and natural languages.arXiv preprint arXiv:2002.08155(2020)

  17. [17]

    Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. VulRepair: a T5-based automated software vulnerability repair. InProceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering. 935–947

  18. [18]

    Jing Gong, Yanghui Wu, Linxi Liang, Zibin Zheng, and Yanlin Wang. 2024. CoSQA+: Enhancing Code Search Dataset with Matching Code.arXiv preprint arXiv:2406.11589(2024)

  19. [19]

    Wenchao Gu, Zongyi Lyu, Yanlin Wang, Hongyu Zhang, Cuiyun Gao, and Michael R. Lyu. 2025. SPENCER: Self- Adaptive Model Distillation for Efficient Code Retrieval.ACM Transactions on Software Engineering and Methodology (2025)

  20. [20]

    Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Michael Lyu. 2022. Accelerating code search with deep hashing and code classification. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2534–2544. 20 Liu et al

  21. [21]

    Xiaodong Gu, Hongyu Zhang, and Sunghun Kim. 2018. Deep code search. InProceedings of the 40th international conference on software engineering. 933–944

  22. [22]

    Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. Unixcoder: Unified cross-modal pre-training for code representation.arXiv preprint arXiv:2203.03850(2022)

  23. [23]

    Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, et al. 2020. Graphcodebert: Pre-training code representations with data flow.arXiv preprint arXiv:2009.08366 (2020)

  24. [24]

    Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Yu Wu, YK Li, et al. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming–The Rise of Code Intelligence. arXiv preprint arXiv:2401.14196(2024)

  25. [25]

    Vincent J Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global relational models of source code. InInternational conference on learning representations

  26. [26]

    Dan Hendrycks, Xiaoyuan Liu, Eric Wallace, Adam Dziedzic, Rishabh Krishnan, and Dawn Song. 2020. Pretrained transformers improve out-of-distribution robustness.arXiv preprint arXiv:2004.06100(2020)

  27. [27]

    Emily Hill, Manuel Roldan-Vega, Jerry Alan Fails, and Greg Mallet. 2014. NL-based query refinement and contextu- alized code search results: A user study. In2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). IEEE, 34–43

  28. [28]

    Baizhou Huang, Shuai Lu, Weizhu Chen, Xiaojun Wan, and Nan Duan. 2023. Enhancing large language models in coding through multi-perspective self-consistency.arXiv preprint arXiv:2309.17272(2023)

  29. [29]

    Yufan Huang, Mengnan Qi, Yongqiang Yao, Maoquan Wang, Bin Gu, Colin Clement, and Neel Sundaresan. 2023. Program translation via code distillation.arXiv preprint arXiv:2310.11476(2023)

  30. [30]

    Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. 2024. Qwen2.5-coder technical report.arXiv preprint arXiv:2409.12186(2024)

  31. [31]

    Faria Huq, Masum Hasan, Md Mahim Anjum Haque, Sazan Mahbub, Anindya Iqbal, and Toufique Ahmed. 2022. Review4repair: Code review aided automatic program repairing.Information and Software Technology143 (2022), 106765

  32. [32]

    Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search.arXiv preprint arXiv:1909.09436(2019)

  33. [33]

    Jeevana Priya Inala, Chenglong Wang, Mei Yang, Andres Codas, Mark Encarnación, Shuvendu Lahiri, Madanlal Musuvathi, and Jianfeng Gao. 2022. Fault-aware neural code rankers.Advances in Neural Information Processing Systems35 (2022), 13419–13432

  34. [34]

    Chen Ji, Su Yang, Hongyu Sun, and Yuqing Zhang. 2024. Applying Contrastive Learning to Code Vulnerability Type Classification. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 11942–11952

  35. [35]

    Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, et al. 2024. aixcoder-7b: A lightweight and effective large language model for code completion.arXiv e-prints(2024), arXiv–2410

  36. [36]

    Zhonghao Jiang, Xiaoxue Ren, Meng Yan, Wei Jiang, Yong Li, and Zhongxin Liu. 2025. CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching.arXiv preprint arXiv:2503.22424(2025)

  37. [37]

    Tae-Hwan Jung. 2021. Commitbert: Commit message generation using pre-trained programming language model. arXiv preprint arXiv:2105.14242(2021)

  38. [38]

    Sungmin Kang, Louis Milliken, and Shin Yoo. 2024. Identifying inaccurate descriptions in llm-generated code comments via test execution.arXiv preprint arXiv:2406.14836(2024)

  39. [39]

    Kisub Kim, Dongsun Kim, Tegawendé F Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY: a code-to-code search engine. InProceedings of the 40th International Conference on Software Engineering. 946–957

  40. [40]

    Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages.arXiv preprint arXiv:2006.03511(2020)

  41. [41]

    Haochen Li, Chunyan Miao, Cyril Leung, Yanxian Huang, Yuan Huang, Hongyu Zhang, and Yanlin Wang. 2022. Exploring representation-level augmentation for code search.arXiv preprint arXiv:2210.12285(2022)

  42. [42]

    Jiawei Li, David Faragó, Christian Petrov, and Iftekhar Ahmed. 2025. Optimization is Better than Generation: Optimizing Commit Message Leveraging Human-written Commit Message.arXiv preprint arXiv:2501.09861(2025)

  43. [43]

    Lingwei Li, Li Yang, Huaxi Jiang, Jun Yan, Tiejian Luo, Zihan Hua, Geng Liang, and Chun Zuo. 2022. AUGER: automatically generating review comments with pre-training models. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1009–1021

  44. [44]

    Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. Starcoder: may the source be with you!arXiv preprint arXiv:2305.06161 (2023)

  45. [45]

    Wen-Ding Li and Kevin Ellis. 2025. Is programming by example solved by llms?Advances in Neural Information Processing Systems37 (2025), 44761–44790. XSearch: Explainable Code Search via Concept-to-Code Alignment 21

  46. [46]

    Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, and Nan Duan. 2022. Coderetriever: A large scale contrastive pre-training method for code search. InProceedings of the 2022 conference on empirical methods in natural language processing. 2898–2910

  47. [47]

    Zhen Li, Deqing Zou, Shouhuai Xu, Hai Jin, Yawei Zhu, and Zhaoxuan Chen. 2021. Sysevr: A framework for using deep learning to detect software vulnerabilities.IEEE Transactions on Dependable and Secure Computing19, 4 (2021), 2244–2258

  48. [48]

    Zhen Li, Deqing Zou, Shouhuai Xu, Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection.arXiv preprint arXiv:1801.01681(2018)

  49. [49]

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980–2988

  50. [50]

    Chao Liu, Xin Xia, David Lo, Cuiyun Gao, Xiaohu Yang, and John Grundy. 2021. Opportunities and challenges in code search tools.ACM Computing Surveys (CSUR)54, 9 (2021), 1–40

  51. [51]

    Wei Liu, Ailun Yu, Daoguang Zan, Bo Shen, Wei Zhang, Haiyan Zhao, Zhi Jin, and Qianxiang Wang. 2024. Graphcoder: Enhancing repository-level code completion via code context graph-based retrieval and language model.arXiv preprint arXiv:2406.07003(2024)

  52. [52]

    Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo, and Yucong Duan. 2015. Query expansion via wordnet for effective code search. In2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 545–549

  53. [53]

    Fei Lv, Hongyu Zhang, Jian-guang Lou, Shaowei Wang, Dongmei Zhang, and Jianjun Zhao. 2015. Codehow: Effective code search based on api understanding and extended boolean model (e). In2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 260–270

  54. [54]

    Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other.The annals of mathematical statistics(1947), 50–60

  55. [55]

    Collin McMillan, Mark Grechanik, Denys Poshyvanyk, Qing Xie, and Chen Fu. 2011. Portfolio: finding relevant functions and their usage. InProceedings of the 33rd International Conference on Software Engineering. 111–120

  56. [56]

    Microsoft. 2021. GraphCodeBERT model. https://huggingface.co/microsoft/graphcodebert-base

  57. [57]

    Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-tau Yih, Sida Wang, and Xi Victoria Lin. 2023. Lever: Learning to verify language-to-code generation with execution. InInternational Conference on Machine Learning. PMLR, 26106–26128

  58. [58]

    Liming Nie, He Jiang, Zhilei Ren, Zeyi Sun, and Xiaochen Li. 2016. Query expansion based on crowd knowledge for code search.IEEE Transactions on Services Computing9, 5 (2016), 771–783

  59. [59]

    Yu Nong, Rainy Sharma, Abdelwahab Hamou-Lhadj, Xiapu Luo, and Haipeng Cai. 2022. Open science in software engineering: A study on deep learning-based vulnerability detection.IEEE Transactions on Software Engineering49, 4 (2022), 1983–2005

  60. [60]

    Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748(2018)

  61. [61]

    OpenAI. 2024. GPT-4o Technical Report. https://openai.com/index/hello-gpt-4o/. Accessed: 2025-01

  62. [62]

    Yun Peng, Shuzheng Gao, Cuiyun Gao, Yintong Huo, and Michael Lyu. 2024. Domain knowledge matters: Improving prompts with fix templates for repairing python type errors. InProceedings of the 46th ieee/acm international conference on software engineering. 1–13

  63. [63]

    David Piorkowski, Austin Z Henley, Tahmid Nabi, Scott D Fleming, Christopher Scaffidi, and Margaret Burnett. 2016. Foraging and navigations, fundamentally: developers’ predictions of value and cost. InProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 97–108

  64. [64]

    Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. InProceedings of the 38th International Conference on Software Engineering. 357–367

  65. [65]

    Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, et al. 2023. Code llama: Open foundation models for code.arXiv preprint arXiv:2308.12950 (2023)

  66. [66]

    Baptiste Roziere, Jie M Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, and Guillaume Lample. 2021. Leveraging automated unit tests for unsupervised code translation.arXiv preprint arXiv:2110.06773(2021)

  67. [67]

    Anthony Saieva, Saikat Chakraborty, and Gail Kaiser. 2024. Reinforest: Reinforcing semantic code similarity for cross-lingual code search models. In2024 IEEE International Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 177–188

  68. [68]

    Chaochen Shi, Borui Cai, Yao Zhao, Longxiang Gao, Keshav Sood, and Yong Xiang. 2023. Coss: Leveraging statement semantics for code summarization.IEEE Transactions on Software Engineering49, 6 (2023), 3472–3486

  69. [69]

    Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2023. Cocosoda: Effective contrastive learning for code search. In2023 IEEE/ACM 45th International Conference on Software 22 Liu et al. Engineering (ICSE). IEEE, 2198–2210

  70. [70]

    Jaspreet Singh and Avishek Anand. 2019. Exs: Explainable search using local model agnostic interpretability. In Proceedings of the twelfth ACM international conference on web search and data mining. 770–773

  71. [71]

    Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. MPNet: Masked and Permuted Pre-training for Language Understanding.arXiv preprint arXiv:2004.09297(2020)

  72. [72]

    Benjamin Steenhoek, Md Mahbubur Rahman, Richard Jiles, and Wei Le. 2023. An empirical study of deep learning models for vulnerability detection. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2237–2248

  73. [73]

    Elias Stengel-Eskin, Archiki Prasad, and Mohit Bansal. 2024. Regal: Refactoring programs to discover generalizable abstractions.arXiv preprint arXiv:2401.16467(2024)

  74. [74]

    Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization.Automated Software Engineering 31, 1 (2024), 22

  75. [75]

    Marc Szafraniec, Baptiste Roziere, Hugh Leather, Francois Charton, Patrick Labatut, and Gabriel Synnaeve. 2022. Code translation with compiler representations.arXiv preprint arXiv:2207.03578(2022)

  76. [76]

    Xunzhu Tang, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, Tegawende F Bissyande, et al. 2023. Hyperbolic code retrieval: a novel approach for efficient code search using hyperbolic space embeddings.arXiv preprint arXiv:2308.15234 (2023)

  77. [77]

    Xunzhu Tang, Kisub Kim, Yewei Song, Cedric Lothritz, Bei Li, Saad Ezzini, Haoye Tian, Jacques Klein, and Tegawendé F Bissyandé. 2024. CodeAgent: Autonomous Communicative Agents for Code Review.arXiv preprint arXiv:2402.02172 (2024)

  78. [78]

    Ze Tang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, Liguo Huang, Zhelin Zhu, and Bin Luo. 2022. AST-trans: Code summarization with efficient tree-structured attention. InProceedings of the 44th International Conference on Software Engineering. 150–162

  79. [79]

    Ali TehraniJamsaz, Arijit Bhattacharjee, Le Chen, Nesreen K Ahmed, Amir Yazdanbakhsh, and Ali Jannesari. 2024. CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming.arXiv preprint arXiv:2410.20527(2024)

  80. [80]

    Rosalia Tufano, Simone Masiero, Antonio Mastropaolo, Luca Pascarella, Denys Poshyvanyk, and Gabriele Bavota

Showing first 80 references.