A self-play method using formal proofs and counterexamples trains LLMs to better judge semantic equivalence of Haskell code, yielding up to 13.3 percentage point gains on EquiBench.
@-}}`annotation , with the exact naming pattern lemma_<P>_equiv
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification
A self-play method using formal proofs and counterexamples trains LLMs to better judge semantic equivalence of Haskell code, yielding up to 13.3 percentage point gains on EquiBench.