Reshaping reasoning in llms: A theoretical analysis of rl training dynamics through pattern selection.arXiv preprint arXiv:2506.04695,

Xingwu Chen, Tianle Li, Difan Zou · arXiv 2506.04695

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Not only where, But when: Temporal Scheduling for RLVR

cs.LG · 2026-05-25 · unverdicted · novelty 7.0

Temporal scheduling of credit allocation criteria over RLVR training, using trajectory percentiles to target heterogeneous behaviors, yields more stable policy entropy and better reasoning benchmark results than static allocation.

citing papers explorer

Showing 1 of 1 citing paper.

Not only where, But when: Temporal Scheduling for RLVR cs.LG · 2026-05-25 · unverdicted · none · ref 24
Temporal scheduling of credit allocation criteria over RLVR training, using trajectory percentiles to target heterogeneous behaviors, yields more stable policy entropy and better reasoning benchmark results than static allocation.

Reshaping reasoning in llms: A theoretical analysis of rl training dynamics through pattern selection.arXiv preprint arXiv:2506.04695,

fields

years

verdicts

representative citing papers

citing papers explorer