Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

A new benchmark uses separate predictor and scorer LLMs to test whether forecast strings improve likelihood of hidden mathematical equation continuations, with controls that detect priming shortcuts.

citing papers explorer

Showing 1 of 1 citing paper.

Likelihood scoring for continuations of mathematical text: a self-supervised benchmark with tests for shortcut vulnerabilities cs.LG · 2026-05-11 · unverdicted · none · ref 20
A new benchmark uses separate predictor and scorer LLMs to test whether forecast strings improve likelihood of hidden mathematical equation continuations, with controls that detect priming shortcuts.

Let’s verify step by step

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer