pith. sign in

arxiv: 2504.08975 · v1 · pith:OI4QF7MKnew · submitted 2025-04-11 · 💻 cs.SE · cs.IR

Code-Craft: Hierarchical Graph-Based Code Summarization for Enhanced Context Retrieval

classification 💻 cs.SE cs.IR
keywords coderetrievalcodebaseshcgshierarchicalapproachgraphpercentage
0
0 comments X
read the original abstract

Understanding and navigating large-scale codebases remains a significant challenge in software engineering. Existing methods often treat code as flat text or focus primarily on local structural relationships, limiting their ability to provide holistic, context-aware information retrieval. We present Hierarchical Code Graph Summarization (HCGS), a novel approach that constructs a multi-layered representation of a codebase by generating structured summaries in a bottom-up fashion from a code graph. HCGS leverages the Language Server Protocol for language-agnostic code analysis and employs a parallel level-based algorithm for efficient summary generation. Through extensive evaluation on five diverse codebases totaling 7,531 functions, HCGS demonstrates significant improvements in code retrieval accuracy, achieving up to 82 percentage relative improvement in top-1 retrieval precision for large codebases like libsignal (27.15 percentage points), and perfect Pass@3 scores for smaller repositories. The system's hierarchical approach consistently outperforms traditional code-only retrieval across all metrics, with particularly substantial gains in larger, more complex codebases where understanding function relationships is crucial.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Agent4cs: A Multi-agent System for Code Summarization in Large Hierarchical Codebases

    cs.AI 2026-07 unverdicted novelty 5.0

    Agent4cs deploys summarization, keyword-extraction, and quality-assurance agents in a bottom-up pipeline that raises semantic consistency by 8% and normalized keyword coverage by up to 38% over structured prompting ba...