Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SE 2verdicts
UNVERDICTED 2representative citing papers
VulWeaver improves Java vulnerability detection to 0.75 F1 by enhancing dependency graphs with LLM semantic fixes, extracting full context from slices plus implicit usage info, and applying type-specific meta-prompting with majority voting.
citing papers explorer
-
Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems
Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.
-
VulWeaver: Weaving Broken Semantics for Grounded Vulnerability Detection
VulWeaver improves Java vulnerability detection to 0.75 F1 by enhancing dependency graphs with LLM semantic fixes, extracting full context from slices plus implicit usage info, and applying type-specific meta-prompting with majority voting.