pith. sign in

Agent-rlvr: Training software engineering agents via guidance and environment rewards

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 8 2025 1

roles

background 2

polarities

background 2

clear filters

representative citing papers

ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL

cs.AI · 2026-06-01 · unverdicted · novelty 6.0

ReSkill is an RL-in-the-loop framework that embeds assertion-driven skill creation, within-group sampling, and Thompson Sampling into GRPO to reconcile skill evolution with policy learning, outperforming prior methods especially on unseen tasks.

SWE-Shepherd: Advancing PRMs for Reinforcing Code Agents

cs.SE · 2026-04-12 · unverdicted · novelty 5.0

SWE-Shepherd trains a lightweight PRM on SWE-Bench trajectories to score intermediate actions and guide code agents, showing gains in efficiency and action quality on SWE-Bench Verified.

Trading Human Curation for Synthetic Augmentation in RLVR

cs.LG · 2026-06-02 · unverdicted · novelty 4.0

Gated synthetic augmentations can substitute for additional human-authored RLVR tasks at a cost-adjusted trade rate of 1.4x-11.6x while retaining held-out generalization on ten benchmarks spanning code, instruction following, reasoning, and agentic function calling.

citing papers explorer

Showing 1 of 1 citing paper after filters.