arXiv preprint arXiv:2507.08649 , year=

Leanabell-prover-v2: Verifier-integrated reasoning for formal theorem proving via reinforcement learning , author= · 2025 · arXiv 2507.08649

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

OProver: A Unified Framework for Agentic Formal Theorem Proving

cs.CL · 2026-05-17 · unverdicted · novelty 6.0

OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.

SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

cs.CL · 2026-04-18 · unverdicted · novelty 6.0

SPS interleaves RL and IRL to counteract probability squeezing in LLM reasoning trajectories, improving Pass@k on five benchmarks while identifying an empirical upper bound on multi-sample performance.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 3 of 3 citing papers.

OProver: A Unified Framework for Agentic Formal Theorem Proving cs.CL · 2026-05-17 · unverdicted · none · ref 159
OProver-32B achieves top Pass@32 scores on MiniF2F, ProverBench, and PutnamBench by combining continued pretraining with iterative agentic proving, retrieval, SFT on repairs, and RL on unresolved cases using a 6.86M-proof dataset.
SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models cs.CL · 2026-04-18 · unverdicted · none · ref 14
SPS interleaves RL and IRL to counteract probability squeezing in LLM reasoning trajectories, improving Pass@k on five benchmarks while identifying an empirical upper bound on multi-sample performance.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 232
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

arXiv preprint arXiv:2507.08649 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer