A self-play method using formal proofs and counterexamples trains LLMs to better judge semantic equivalence of Haskell code, yielding up to 13.3 percentage point gains on EquiBench.
This purity ensures that two functions are semantically equivalent if they produce the same outputs for all inputs, regardless of how those outputs are computed
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification
A self-play method using formal proofs and counterexamples trains LLMs to better judge semantic equivalence of Haskell code, yielding up to 13.3 percentage point gains on EquiBench.