Self-Training (Munkhbat et al., 2025) uses Best-of-N sampling to select the shortest cor- rect reasoning path as training data

employs a binary search to find optimal token budgets, trains the model to follow these constraints · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models

cs.CL · 2026-01-16 · unverdicted · novelty 6.0

NCoTS treats chain-of-thought reasoning as a search problem and uses a dual-factor heuristic to find paths that are over 3.5% more accurate and 22% shorter on benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Neural Chain-of-Thought Search: Searching the Optimal Reasoning Path to Enhance Large Language Models cs.CL · 2026-01-16 · unverdicted · none · ref 19
NCoTS treats chain-of-thought reasoning as a search problem and uses a dual-factor heuristic to find paths that are over 3.5% more accurate and 22% shorter on benchmarks.

Self-Training (Munkhbat et al., 2025) uses Best-of-N sampling to select the shortest cor- rect reasoning path as training data

fields

years

verdicts

representative citing papers

citing papers explorer