pith. sign in

Step-level value preference optimiza- tion for mathematical reasoning

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

dataset 1

citation-polarity summary

fields

cs.AI 2

years

2026 1 2025 1

verdicts

UNVERDICTED 2

roles

dataset 1

polarities

use dataset 1

clear filters

representative citing papers

Step-by-Step Optimization-like Reasoning in LLMs over Expanding Search Spaces

cs.AI · 2026-06-03 · unverdicted · novelty 7.0

Introduces OPT* tasks and two training regimes (solver-guided online policy optimization with rank-based reward shaping and search-based offline RL) plus a theoretical link between search success and information extraction per budget unit, showing empirical gains in optimization-like reasoning.

citing papers explorer

Showing 2 of 2 citing papers after filters.