Heavy supervised fine-tuning on formal math suppresses tool-calling in Goedel-Prover-V2 from 89.4% to near 0%, but 100 Lean agentic traces restore it to 83.8% on the Berkeley Function Calling Leaderboard with in-domain gains on ProofNet.
Mitigating catastrophic forgetting in language transfer via model merging
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover
Heavy supervised fine-tuning on formal math suppresses tool-calling in Goedel-Prover-V2 from 89.4% to near 0%, but 100 Lean agentic traces restore it to 83.8% on the Berkeley Function Calling Leaderboard with in-domain gains on ProofNet.