Reshaping reason- ing in llms: A theoretical analysis of rl training dynamics through pattern selection.arXiv preprint arXiv:2506.04695, 2025

Xingwu Chen, Tianle Li, Difan Zou · 2025 · arXiv 2506.04695

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Not only where, But when: Temporal Scheduling for RLVR

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

Temporal scheduling of credit allocation criteria over RLVR training, using trajectory percentiles to target heterogeneous behaviors, yields more stable policy entropy and better reasoning benchmark results than static allocation.

Which Tokens Matter? Adaptive Token Selection for RLVR with the Relative Surprisal Index

cs.AI · 2026-06-30 · unverdicted · novelty 6.0

Introduces RSI metric and RSI-S filtering method for adaptive token selection in RLVR, reporting 2-3 point gains over GRPO on AIME/AMC benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Not only where, But when: Temporal Scheduling for RLVR cs.LG · 2026-05-25 · unverdicted · none · ref 24
Temporal scheduling of credit allocation criteria over RLVR training, using trajectory percentiles to target heterogeneous behaviors, yields more stable policy entropy and better reasoning benchmark results than static allocation.

Reshaping reason- ing in llms: A theoretical analysis of rl training dynamics through pattern selection.arXiv preprint arXiv:2506.04695, 2025

fields

years

verdicts

representative citing papers

citing papers explorer