pith. sign in

arxiv: 2506.11066 · v3 · pith:B5P7V2ZPnew · submitted 2025-05-31 · 💻 cs.SE · cs.AI

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

classification 💻 cs.SE cs.AI
keywords coderetrievalcoquirqualitybenchmarkmodelsquality-awaresoftware
0
0 comments X
read the original abstract

Code retrieval is essential in modern software development, as it boosts code reuse and accelerates debugging. However, current benchmarks primarily emphasize functional relevance while neglecting critical dimensions of software quality. Motivated by this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across four key dimensions: correctness, efficiency, security, and maintainability. CoQuIR provides fine-grained quality annotations for 42,725 queries and 134,907 code snippets in 11 programming languages, and is accompanied by two quality-centric evaluation metrics: Pairwise Preference Accuracy and Margin-based Ranking Score. Using CoQuIR, we benchmark 23 retrieval models, covering both open-source and proprietary systems, and find that even top-performing models frequently fail to distinguish buggy or insecure code from their more robust counterparts. Furthermore, we conduct preliminary investigations into training methods that explicitly encourage retrievers to recognize code quality. Using synthetic datasets, we demonstrate promising improvements in quality-aware metrics across various models, without sacrificing semantic relevance. Downstream code generation experiments further validate the effectiveness of our approach. Overall, our work highlights the importance of integrating quality signals into code retrieval systems, laying the groundwork for more trustworthy and robust software development tools.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval

    cs.SE 2026-04 unverdicted novelty 7.0

    CodeMMR creates a unified embedding space for text, code, and images, outperforming baselines by 10 nDCG@10 points and boosting RAG code generation quality.

  2. Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

    cs.SE 2026-05 accept novelty 6.0

    A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.

  3. A Survey of Reasoning-Intensive Retrieval: Progress and Challenges

    cs.IR 2026-04 unverdicted novelty 6.0

    A survey that categorizes RIR benchmarks by domain and modality, proposes a taxonomy for integrating reasoning into retrieval pipelines, and outlines key challenges.