The data-generation, checkpointing, and evaluation infrastructure is already in place; the extensions require only additional compute, and we will include them in a revised version

These robustness checks, particularly the temperature sweep, which controls for the possibility that RL training mechanically lowers the output entropy, thereby inflates PASS@(1,T) while deflating PASS@(64,T), are the most important next · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

RL expands the capability boundary of LLM agents on compositional tool-use tasks, shown by non-converging pass curves at large k with increasing T, while SFT regresses it and the effect is absent on simpler tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis cs.LG · 2026-04-16 · unverdicted · none · ref 24
RL expands the capability boundary of LLM agents on compositional tool-use tasks, shown by non-converging pass curves at large k with increasing T, while SFT regresses it and the effect is absent on simpler tasks.

The data-generation, checkpointing, and evaluation infrastructure is already in place; the extensions require only additional compute, and we will include them in a revised version

fields

years

verdicts

representative citing papers

citing papers explorer