ICT framework applies JS divergence to token logits to select critical tokens for selective RLVR updates, claiming 4.58% average pass@4 gains on Qwen2.5 models across seven reasoning benchmarks.
Enhancing large language model reasoning via selective critical token fine-tuning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond Entropy: Learning from Token-Level Distributional Deviations for LLM Reasoning
ICT framework applies JS divergence to token logits to select critical tokens for selective RLVR updates, claiming 4.58% average pass@4 gains on Qwen2.5 models across seven reasoning benchmarks.