Title resolution pending

Cov[logπθ,A ] < 0:∆ H > 0(entropy↑, policy more exploratory)

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Sequence-Level Likelihood

cs.CL · 2026-04-14 · unverdicted · novelty 5.0

TEPO uses sequence-level likelihood for token-level reward aggregation and a KL mask on positive-advantage tokens to improve stability and performance over GRPO in mathematical reasoning.

citing papers explorer

Showing 1 of 1 citing paper.

Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Sequence-Level Likelihood cs.CL · 2026-04-14 · unverdicted · none · ref 5
TEPO uses sequence-level likelihood for token-level reward aggregation and a KL mask on positive-advantage tokens to improve stability and performance over GRPO in mathematical reasoning.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer