Afterburner: Reinforcement learning facilitates self-improving code efficiency optimization

Mingzhe Du, Luu Anh Tuan, Yue Liu, Yuhao Qing, Dong Huang, Xinyi He, Qian Liu, Zejun Ma, See kiong Ng · 2025 · arXiv 2505.23387

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

representative citing papers

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL

cs.LG · 2026-05-27 · conditional · novelty 7.0

Extrapolative weight averaging of RL checkpoints trained under nested unit-test coverage extends a correctness-efficiency frontier and boosts ensemble pass rates in code generation across model scales and inference modes.

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

cs.CL · 2026-06-16 · unverdicted · novelty 6.0

The LLM-as-Environment-Engineer framework lets the policy model redesign its own RL environments on the new MAPF-FrozenLake testbed, outperforming larger models and fixed baselines with Qwen3-4B.

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

cs.CL · 2026-06-09 · conditional · novelty 6.0

CoT SFT disrupts long-range routing in hybrid models via changes to W_Q and W_K; QK-Restore restores pre-SFT projections to recover NIAH performance.

AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.

Paper Espresso: From Paper Overload to Research Insight

cs.DL · 2026-04-06 · unverdicted · novelty 6.0

Paper Espresso deploys LLMs to summarize and analyze trends across 13,300+ arXiv papers over 35 months, releasing metadata that shows non-saturating topic growth and higher engagement for novel topics.

An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code

cs.LG · 2026-03-03 · unverdicted · novelty 5.0

Contrastive Prompt Tuning raises code accuracy on two of three tested models but produces inconsistent energy-efficiency gains that depend on model, language, and task.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 6 of 6 citing papers after filters.

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL cs.LG · 2026-05-27 · conditional · none · ref 8
Extrapolative weight averaging of RL checkpoints trained under nested unit-test coverage extends a correctness-efficiency frontier and boosts ensemble pass rates in code generation across model scales and inference modes.
From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning cs.CL · 2026-06-16 · unverdicted · none · ref 22
The LLM-as-Environment-Engineer framework lets the policy model redesign its own RL environments on the new MAPF-FrozenLake testbed, outperforming larger models and fixed baselines with Qwen3-4B.
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It cs.CL · 2026-06-09 · conditional · none · ref 22
CoT SFT disrupts long-range routing in hybrid models via changes to W_Q and W_K; QK-Restore restores pre-SFT projections to recover NIAH performance.
AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code cs.CL · 2026-05-18 · unverdicted · none · ref 65
AutoVecCoder combines VecPrompt for automated intrinsic knowledge synthesis and VecRL for efficiency-aligned RL to train an 8B LLM that achieves SOTA on SimdBench SSE/AVX subsets and sometimes exceeds -O3 compiler results.
Paper Espresso: From Paper Overload to Research Insight cs.DL · 2026-04-06 · unverdicted · none · ref 14
Paper Espresso deploys LLMs to summarize and analyze trends across 13,300+ arXiv papers over 35 months, releasing metadata that shows non-saturating topic growth and higher engagement for novel topics.
An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code cs.LG · 2026-03-03 · unverdicted · none · ref 7
Contrastive Prompt Tuning raises code accuracy on two of three tested models but produces inconsistent energy-efficiency gains that depend on model, language, and task.

Afterburner: Reinforcement learning facilitates self-improving code efficiency optimization

fields

years

verdicts

representative citing papers

citing papers explorer