KBSpec maintains an evolving knowledge base combining external docs and internal verifier feedback to improve LLM generation of verifiable JML specifications, achieving 10-25% higher verification pass rates.
Can LLMs reason about program se- mantics? a comprehensive evaluation of LLMs on formal specification inference
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.SE 3years
2026 3roles
background 1polarities
background 1representative citing papers
CodeSpecBench shows LLMs achieve at most 20.2% pass rate on repository-level executable behavioral specification generation, revealing that strong code generation does not imply deep semantic understanding.
AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.
citing papers explorer
-
CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation
CodeSpecBench shows LLMs achieve at most 20.2% pass rate on repository-level executable behavioral specification generation, revealing that strong code generation does not imply deep semantic understanding.