TRACER attaches verifiable sentence-level provenance records to multimodal agent outputs using tool-turn alignment and semantic relations, yielding 78.23% answer accuracy and fewer tool calls than baselines on TRACE-Bench.
From novice to expert: Llm agent policy optimization via step-wise reinforcement learning
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CL 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
SPS interleaves RL and IRL to counteract probability squeezing in LLM reasoning trajectories, improving Pass@k on five benchmarks while identifying an empirical upper bound on multi-sample performance.
A survey of credit assignment techniques in LLM reinforcement learning that distinguishes maturing methods for reasoning from new approaches needed for agentic settings and provides supporting resources.
citing papers explorer
-
TRACER: Verifiable Generative Provenance for Multimodal Tool-Using Agents
TRACER attaches verifiable sentence-level provenance records to multimodal agent outputs using tool-turn alignment and semantic relations, yielding 78.23% answer accuracy and fewer tool calls than baselines on TRACE-Bench.
-
SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models
SPS interleaves RL and IRL to counteract probability squeezing in LLM reasoning trajectories, improving Pass@k on five benchmarks while identifying an empirical upper bound on multi-sample performance.
-
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models
A survey of credit assignment techniques in LLM reinforcement learning that distinguishes maturing methods for reasoning from new approaches needed for agentic settings and provides supporting resources.