rstar-coder: Scaling competitive code reasoning with a large-scale verified dataset

Yifei Liu, Li Lyna Zhang, Yi Zhu, Bingcheng Dong, Xudong Zhou, Ning Shang · 2025 · arXiv 2505.21297

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

cs.LG · 2026-05-11 · conditional · novelty 7.0 · 2 refs

DuST self-trains LLMs for code generation by ranking their own test-time samples via sandbox execution and applying GRPO, improving judgment by +6.2 NDCG and single-sample pass@1 by +3.1 on LiveCodeBench.

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

The LLM-as-Environment-Engineer framework lets the policy model redesign its own RL environments on the new MAPF-FrozenLake testbed, outperforming larger models and fixed baselines with Qwen3-4B.

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

cs.CL · 2026-06-09 · conditional · novelty 6.0

CoT SFT disrupts long-range routing in hybrid models via changes to W_Q and W_K; QK-Restore restores pre-SFT projections to recover NIAH performance.

PaT: Planning-after-Trial for Efficient Test-Time Code Generation

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.

You Don't Need Public Tests to Generate Correct Code

cs.SE · 2026-04-23 · unverdicted · novelty 6.0

DryRUN lets LLMs create their own test inputs and run internal simulations for self-correcting code generation, matching the performance of test-dependent methods like CodeSIM on LiveCodeBench without public tests or external signals.

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

cs.AI · 2026-04-03 · unverdicted · novelty 6.0

GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.

Generating Verifiable Chain of Thoughts from Exection-Traces

cs.SE · 2025-11-28 · unverdicted · novelty 6.0

A pipeline produces 54,000 execution-trace-verified bi-directional Chain-of-Thought rationales for code, and fine-tuning on them yields gains up to 26.6 points on LiveCodeBench-Exec and similar benchmarks.

Toward Training Superintelligent Software Agents through Self-Play SWE-RL

cs.SE · 2025-12-21

citing papers explorer

Showing 6 of 6 citing papers after filters.

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling cs.LG · 2026-05-11 · conditional · none · ref 7 · 2 links
DuST self-trains LLMs for code generation by ranking their own test-time samples via sandbox execution and applying GRPO, improving judgment by +6.2 NDCG and single-sample pass@1 by +3.1 on LiveCodeBench.
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning cs.CL · 2026-06-16 · unverdicted · none · ref 36
The LLM-as-Environment-Engineer framework lets the policy model redesign its own RL environments on the new MAPF-FrozenLake testbed, outperforming larger models and fixed baselines with Qwen3-4B.
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It cs.CL · 2026-06-09 · conditional · none · ref 36
CoT SFT disrupts long-range routing in hybrid models via changes to W_Q and W_K; QK-Restore restores pre-SFT projections to recover NIAH performance.
PaT: Planning-after-Trial for Efficient Test-Time Code Generation cs.CL · 2026-05-08 · unverdicted · none · ref 40
PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.
You Don't Need Public Tests to Generate Correct Code cs.SE · 2026-04-23 · unverdicted · none · ref 15
DryRUN lets LLMs create their own test inputs and run internal simulations for self-correcting code generation, matching the performance of test-dependent methods like CodeSIM on LiveCodeBench without public tests or external signals.
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning cs.AI · 2026-04-03 · unverdicted · none · ref 17
GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.

rstar-coder: Scaling competitive code reasoning with a large-scale verified dataset

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer