{"work":{"id":"49792b83-569e-4f5f-ae80-e96cbd3b7a43","openalex_id":null,"doi":null,"arxiv_id":"2501.04519","raw_key":null,"title":"rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking","authors":null,"authors_text":"Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu","year":2025,"venue":"cs.CL","abstract":"We present rStar-Math to demonstrate that small language models (SLMs) can rival or even surpass the math reasoning capability of OpenAI o1, without distillation from superior models. rStar-Math achieves this by exercising \"deep thinking\" through Monte Carlo Tree Search (MCTS), where a math policy SLM performs test-time search guided by an SLM-based process reward model. rStar-Math introduces three innovations to tackle the challenges in training the two SLMs: (1) a novel code-augmented CoT data sythesis method, which performs extensive MCTS rollouts to generate step-by-step verified reasoning trajectories used to train the policy SLM; (2) a novel process reward model training method that avoids na\\\"ive step-level score annotation, yielding a more effective process preference model (PPM); (3) a self-evolution recipe in which the policy SLM and PPM are built from scratch and iteratively evolved to improve reasoning capabilities. Through 4 rounds of self-evolution with millions of synthesized solutions for 747k math problems, rStar-Math boosts SLMs' math reasoning to state-of-the-art levels. On the MATH benchmark, it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%. On the USA Math Olympiad (AIME), rStar-Math solves an average of 53.3% (8/15) of problems, ranking among the top 20% the brightest high school math students. Code and data will be available at https://github.com/microsoft/rStar.","external_url":"https://arxiv.org/abs/2501.04519","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-23T04:07:30.493572+00:00","pith_arxiv_id":"2501.04519","created_at":"2026-05-10T03:24:14.968784+00:00","updated_at":"2026-05-23T04:07:30.493572+00:00","title_quality_ok":true,"display_title":"rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking","render_title":"rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking"},"hub":{"state":{"work_id":"49792b83-569e-4f5f-ae80-e96cbd3b7a43","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":29,"external_cited_by_count":null,"distinct_field_count":4,"first_pith_cited_at":"2025-02-02T23:20:16+00:00","last_pith_cited_at":"2026-05-16T00:29:35+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-05-25T04:35:25.972244+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":8}],"polarity_counts":[{"context_polarity":"background","n":7},{"context_polarity":"unclear","n":1}],"runs":{},"summary":{},"graph":{},"authors":[]}}