A parameter-free sampling strategy called CUTS combined with Mixed-CUTS training prevents mode collapse in RL for saturated LLM reasoning tasks and raises AIME25 Pass@1 accuracy by up to 15.1% over standard GRPO.
Learning to reason via mixture-of-thought for logical reasoning
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A neurosymbolic method using two LLM prompting frameworks generates provably correct inductive arguments for 84% of a set of mid-size open-source RTL hardware designs.
VeriTrans achieves 94.46% SAT/UNSAT correctness on SatBench via LLM translation gated by round-trip similarity and deterministic neuro-symbolic execution.
citing papers explorer
-
Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data
A parameter-free sampling strategy called CUTS combined with Mixed-CUTS training prevents mode collapse in RL for saturated LLM reasoning tasks and raises AIME25 Pass@1 accuracy by up to 15.1% over standard GRPO.
-
Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?
A neurosymbolic method using two LLM prompting frameworks generates provably correct inductive arguments for 84% of a set of mid-size open-source RTL hardware designs.
-
VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline
VeriTrans achieves 94.46% SAT/UNSAT correctness on SatBench via LLM translation gated by round-trip similarity and deterministic neuro-symbolic execution.