Weak models used as critics supplying non-misleading revision directions, distilled on-policy via OPCD, improve frozen and trained strong models on reasoning and alignment benchmarks.
W2S-AlignTree: Weak-to-strong inference- time alignment for large language models via monte carlo tree search.arXiv preprint arXiv:2511.11518,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight
Weak models used as critics supplying non-misleading revision directions, distilled on-policy via OPCD, improve frozen and trained strong models on reasoning and alignment benchmarks.