SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
arXiv preprint arXiv:2512.14895 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
TCOD stabilizes on-policy distillation for multi-turn agents via temporal curriculum on trajectory depth, improving performance up to 18 points over vanilla OPD and sometimes surpassing the teacher.
SGCD generates supervision for off-trajectory states in GUI agents by mixing expert trajectories with continuations produced by a skill-guided policy after the base policy reaches those states.
SRC is a fixed-horizon branch review framework for imitation learning in resettable web environments that collects 977 verifier-passing trajectories and 9,183 next-action examples while improving recovery-versus-query tradeoff over step-level review.
LearnWeak specializes small CUAs via weakness detection by a reference agent, targeted task synthesis, and error-aware training, delivering 11+ point gains on OSWorld.
DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.
citing papers explorer
-
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.
-
TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
TCOD stabilizes on-policy distillation for multi-turn agents via temporal curriculum on trajectory depth, improving performance up to 18 points over vanilla OPD and sometimes surpassing the teacher.
-
Skill-Guided Continuation Distillation for GUI Agents
SGCD generates supervision for off-trajectory states in GUI agents by mixing expert trajectories with continuations produced by a skill-guided policy after the base policy reaches those states.
-
Speculative Rollback Correction for Quality-Diverse Web Agent Imitation
SRC is a fixed-horizon branch review framework for imitation learning in resettable web environments that collects 977 verifier-passing trajectories and 9,183 next-action examples while improving recovery-versus-query tradeoff over step-level review.
-
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
LearnWeak specializes small CUAs via weakness detection by a reference agent, targeted task synthesis, and error-aware training, delivering 11+ point gains on OSWorld.