CES applies conditional bidirectional entropy control on top of DAPO to improve accuracy and shorten responses on mathematical benchmarks for 7B and 1.5B LLMs.
arXiv preprint arXiv:2503.21961 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Chain-in-Tree cuts token use, model calls, and runtime by 75-85% in LLM tree search on GSM8K and Math500 by using simple branching-necessity checks, with little accuracy loss in most cases.
citing papers explorer
-
Taming the Thinker: Conditional Entropy Shaping for Adaptive LLM Reasoning
CES applies conditional bidirectional entropy control on top of DAPO to improve accuracy and shorten responses on mathematical benchmarks for 7B and 1.5B LLMs.
-
Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search
Chain-in-Tree cuts token use, model calls, and runtime by 75-85% in LLM tree search on GSM8K and Math500 by using simple branching-necessity checks, with little accuracy loss in most cases.