Llm critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback · 2024 · arXiv 2406.14024

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

cs.CL · 2024-10-10 · conditional · novelty 7.0

Omni-MATH supplies 4428 human-verified Olympiad math problems that expose top LLMs achieving only 52.55% to 60.54% accuracy on the most difficult items.

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning

cs.CL · 2025-07-21 · unverdicted · novelty 6.0

CoLD mitigates length bias in process reward models for mathematical reasoning via counterfactual guidance, length penalties, bias estimation, and joint training, improving step selection accuracy and conciseness on MATH500 and GSM-Plus while boosting downstream RL performance.

The Lessons of Developing Process Reward Models in Mathematical Reasoning

cs.CL · 2025-01-13 · unverdicted · novelty 6.0

Monte Carlo data synthesis for PRMs underperforms LLM-judge and human methods, Best-of-N evaluations suffer from process-outcome misalignment and score inflation, and consensus filtering yields better PRMs with higher data efficiency.

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

cs.CL · 2024-12-25 · unverdicted · novelty 6.0

HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

cs.AI · 2025-03-12 · unverdicted · novelty 5.0

The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

citing papers explorer

Showing 5 of 5 citing papers.

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models cs.CL · 2024-10-10 · conditional · none · ref 57
Omni-MATH supplies 4428 human-verified Olympiad math problems that expose top LLMs achieving only 52.55% to 60.54% accuracy on the most difficult items.
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning cs.CL · 2025-07-21 · unverdicted · none · ref 5
CoLD mitigates length bias in process reward models for mathematical reasoning via counterfactual guidance, length penalties, bias estimation, and joint training, improving step selection accuracy and conciseness on MATH500 and GSM-Plus while boosting downstream RL performance.
The Lessons of Developing Process Reward Models in Mathematical Reasoning cs.CL · 2025-01-13 · unverdicted · none · ref 3
Monte Carlo data synthesis for PRMs underperforms LLM-judge and human methods, Best-of-N evaluations suffer from process-outcome misalignment and score inflation, and consensus filtering yields better PRMs with higher data efficiency.
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs cs.CL · 2024-12-25 · unverdicted · none · ref 79
HuatuoGPT-o1 achieves superior medical complex reasoning by using a verifier to curate reasoning trajectories for fine-tuning and then applying RL with verifier-based rewards.
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models cs.AI · 2025-03-12 · unverdicted · none · ref 199
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.

Llm critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer