Agent-rlvr: Training software engineering agents via guidance and environment rewards

Jeff Da, Clinton Wang, Xiang Deng, Yuntao Ma, Nikhil Barhate, Sean Hendryx · 2025 · arXiv 2506.11425

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

cs.SE · 2025-09-21 · conditional · novelty 6.0

SWE-Bench Pro is a new benchmark with 1,865 long-horizon tasks from 41 repositories designed to evaluate AI agents on realistic enterprise-level software engineering problems beyond prior benchmarks.

SWE-Shepherd: Advancing PRMs for Reinforcing Code Agents

cs.SE · 2026-04-12 · unverdicted · novelty 5.0

SWE-Shepherd trains a lightweight PRM on SWE-Bench trajectories to score intermediate actions and guide code agents, showing gains in efficiency and action quality on SWE-Bench Verified.

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

cs.LG · 2026-05-13 · 2 refs

citing papers explorer

Showing 4 of 4 citing papers.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 16
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks? cs.SE · 2025-09-21 · conditional · none · ref 6
SWE-Bench Pro is a new benchmark with 1,865 long-horizon tasks from 41 repositories designed to evaluate AI agents on realistic enterprise-level software engineering problems beyond prior benchmarks.
SWE-Shepherd: Advancing PRMs for Reinforcing Code Agents cs.SE · 2026-04-12 · unverdicted · none · ref 5
SWE-Shepherd trains a lightweight PRM on SWE-Bench trajectories to score intermediate actions and guide code agents, showing gains in efficiency and action quality on SWE-Bench Verified.
Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective cs.LG · 2026-05-13 · unreviewed · ref 16 · 2 links

Agent-rlvr: Training software engineering agents via guidance and environment rewards

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer