VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

Fuxun Yu; Weiliang Qi; Xinda Wang; Youpeng Li

arxiv: 2511.11896 · v3 · pith:P6NSXLF2new · submitted 2025-11-14 · 💻 cs.CR · cs.AI· cs.SE

VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization

Youpeng Li , Fuxun Yu , Weiliang Qi , Xinda Wang This is my paper

classification 💻 cs.CR cs.AIcs.SE

keywords vulnerabilityreasoningoptimizationvulpocontextualexistingcontext-awarecontextvul

0 comments

read the original abstract

Large language models (LLMs) have recently shown strong potential in vulnerability detection (VD). However, accurately detecting vulnerabilities in real-world repositories requires reasoning over complex contextual interactions. Existing LLM-based VD approaches remain limited because current datasets lack complete contextual information and high-quality reasoning supervision, while existing optimization methods primarily rely on coarse outcome-centric supervision signals that fail to model the vulnerability reasoning process. To address these limitations, we first construct ContextVul, a new dataset that augments high-quality function-level vulnerability benchmarks with repository-level contextual information and curated vulnerability reasoning traces. Building upon ContextVul, we introduce a two-stage optimization framework consisting of lightweight cold-start supervised fine-tuning followed by vulnerability-adaptive on-policy optimization (VULPO). VULPO incorporates multidimensional rewards that jointly evaluate vulnerability identification, vulnerability-relevant localization, and causal reasoning quality, along with difficulty-adaptive reward scaling to mitigate reward hacking and improve RL effectiveness. Extensive experiments demonstrate the superiority of VULPO for context-aware VD. Our VULPO-4B, the first specialized vulnerability reasoning LLM, substantially outperforms existing VD baselines, improving Pairwise Pass@1 by 203% relative to Qwen3-4B and achieving competitive performance against a 150% larger-scale LLM, DeepSeek-V3.1.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics
cs.SE 2026-04 unverdicted novelty 6.0

GLMTest integrates code property graphs and GNNs with LLMs to steer test case generation toward targeted branches, raising branch accuracy from 27.4% to 50.2% on the TestGenEval benchmark.