pith. sign in

hub Mixed citations

arXiv preprint arXiv:2505.12346 , year=

Mixed citation behavior. Most common role is background (50%).

14 Pith papers citing it
Background 50% of classified citations

hub tools

citation-role summary

background 4 method 2

citation-polarity summary

years

2026 12 2025 2

representative citing papers

Self-Distilled RLVR

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

RLSD mixes self-distillation for token-level policy difference magnitudes with RLVR for reliable update directions from response correctness to reach higher convergence and better training stability.

Holder Policy Optimisation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

HölderPO unifies token-level aggregation in GRPO via the Hölder mean with a tunable p parameter and annealing schedule, delivering 54.9% average accuracy on math benchmarks and 93.8% success on ALFWorld.

citing papers explorer

Showing 14 of 14 citing papers.