pith. sign in

arXiv preprint arXiv:2505.23585 , year=

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.LG 8

years

2026 7 2025 1

roles

background 2

polarities

background 2

clear filters

representative citing papers

Holder Policy Optimisation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

HölderPO unifies token-level aggregation in GRPO via the Hölder mean with a tunable p parameter and annealing schedule, delivering 54.9% average accuracy on math benchmarks and 93.8% success on ALFWorld.

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

cs.LG · 2025-10-11 · unverdicted · novelty 5.0

Derives a token-level entropy change approximation revealing four factors, identifies limitations in prior entropy interventions, and proposes STEER which adaptively reweights tokens to mitigate collapse and improve performance on math and coding benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.