DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2representative citing papers
Hindsight Hint Distillation synthesizes hindsight hints from failed rollouts to scaffold successful on-policy trajectories from CoT-free QA data, then self-distills them to improve SWE agent performance by 8% on SWE-bench Verified with strong out-of-distribution generalization.
citing papers explorer
-
Revisiting DAgger in the Era of LLM-Agents
DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.
-
Hindsight Hint Distillation: Scaffolded Reasoning for SWE Agents from CoT-free Answers
Hindsight Hint Distillation synthesizes hindsight hints from failed rollouts to scaffold successful on-policy trajectories from CoT-free QA data, then self-distills them to improve SWE agent performance by 8% on SWE-bench Verified with strong out-of-distribution generalization.