AgentForesight introduces an online auditor model that predicts decisive errors in multi-agent trajectories at the earliest step using a coarse-to-fine reinforcement learning recipe on a new curated dataset AFTraj-2K.
Agentprm: Process reward models for llm agents via step-wise promise and progress
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
dataset 2
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2representative citing papers
GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.
citing papers explorer
-
AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems
AgentForesight introduces an online auditor model that predicts decisive errors in multi-agent trajectories at the earliest step using a coarse-to-fine reinforcement learning recipe on a new curated dataset AFTraj-2K.
-
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation
GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.