SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications

Ma, Lezhi, Liu, Shangqing, Bu, Lei, Li, Shangru, Wang, Yida, Liu, Yang · 2024 · arXiv 2409.12866

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Evaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Game

cs.LG · 2026-05-01 · unverdicted · novelty 6.0

The Obfuscated Natural Number Game shows reasoning LLMs keep proof accuracy without semantic cues while general models degrade, establishing a metric for architectural reasoning in alien math domains.

SpecSyn: LLM-based Synthesis and Refinement of Formal Specifications for Real-world Program Verification

cs.SE · 2026-04-23 · unverdicted · novelty 6.0

SpecSyn generates formal specifications with over 90% precision and 75% recall, successfully verifying 1071 out of 1365 target properties on open-source programs.

citing papers explorer

Showing 2 of 2 citing papers.

Evaluating the Architectural Reasoning Capabilities of LLM Provers via the Obfuscated Natural Number Game cs.LG · 2026-05-01 · unverdicted · none · ref 5
The Obfuscated Natural Number Game shows reasoning LLMs keep proof accuracy without semantic cues while general models degrade, establishing a metric for architectural reasoning in alien math domains.
SpecSyn: LLM-based Synthesis and Refinement of Formal Specifications for Real-world Program Verification cs.SE · 2026-04-23 · unverdicted · none · ref 52
SpecSyn generates formal specifications with over 90% precision and 75% recall, successfully verifying 1071 out of 1365 target properties on open-source programs.

SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications

fields

years

verdicts

representative citing papers

citing papers explorer