DuST self-trains LLMs for code generation by ranking their own test-time samples via sandbox execution and applying GRPO, improving judgment by +6.2 NDCG and single-sample pass@1 by +3.1 on LiveCodeBench.
rstar-coder: Scaling competitive code reasoning with a large-scale verified dataset
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3representative citing papers
The LLM-as-Environment-Engineer framework lets the policy model redesign its own RL environments on the new MAPF-FrozenLake testbed, outperforming larger models and fixed baselines with Qwen3-4B.
CoT SFT disrupts long-range routing in hybrid models via changes to W_Q and W_K; QK-Restore restores pre-SFT projections to recover NIAH performance.
PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.
DryRUN lets LLMs create their own test inputs and run internal simulations for self-correcting code generation, matching the performance of test-dependent methods like CodeSIM on LiveCodeBench without public tests or external signals.
GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.
A pipeline produces 54,000 execution-trace-verified bi-directional Chain-of-Thought rationales for code, and fine-tuning on them yields gains up to 26.6 points on LiveCodeBench-Exec and similar benchmarks.
citing papers explorer
-
Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling
DuST self-trains LLMs for code generation by ranking their own test-time samples via sandbox execution and applying GRPO, improving judgment by +6.2 NDCG and single-sample pass@1 by +3.1 on LiveCodeBench.
-
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning
The LLM-as-Environment-Engineer framework lets the policy model redesign its own RL environments on the new MAPF-FrozenLake testbed, outperforming larger models and fixed baselines with Qwen3-4B.
-
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It
CoT SFT disrupts long-range routing in hybrid models via changes to W_Q and W_K; QK-Restore restores pre-SFT projections to recover NIAH performance.
-
PaT: Planning-after-Trial for Efficient Test-Time Code Generation
PaT defers planning until after failed trials in LLM code generation, enabling heterogeneous cheap-plus-powerful model setups that match large-model performance at roughly 69% lower cost.
-
You Don't Need Public Tests to Generate Correct Code
DryRUN lets LLMs create their own test inputs and run internal simulations for self-correcting code generation, matching the performance of test-dependent methods like CodeSIM on LiveCodeBench without public tests or external signals.
-
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning
GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.