pith. sign in

Gtpo and grpo-s: Token and sequence-level reward shaping with policy entropy.arXiv preprint arXiv:2508.04349

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 6 2025 1

roles

background 2

polarities

background 1 unclear 1

representative citing papers

EGM: Efficient Visual Grounding Language Models

cs.CV · 2026-01-20 · unverdicted · novelty 6.0

EGM enables 8B VLMs to reach 91.4 IoU on RefCOCO at 737 ms latency, outperforming a 235B model at 4320 ms, by substituting volume of mid-quality tokens for model scale.

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

cs.LG · 2025-10-11 · unverdicted · novelty 5.0

Derives a token-level entropy change approximation revealing four factors, identifies limitations in prior entropy interventions, and proposes STEER which adaptively reweights tokens to mitigate collapse and improve performance on math and coding benchmarks.

citing papers explorer

Showing 7 of 7 citing papers.