Lightweight probes on LLM internal states verify reasoning steps as effectively as process reward models up to 810 times larger across math, planning, and QA tasks.
InProceed- ings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- pers), pages 9426–9439, Bangkok, Thailand
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ReProbe: Efficient Test-Time Scaling of Multi-Step Reasoning by Probing Internal States of Large Language Models
Lightweight probes on LLM internal states verify reasoning steps as effectively as process reward models up to 810 times larger across math, planning, and QA tasks.