Introduces OPT* tasks and two training regimes (solver-guided online policy optimization with rank-based reward shaping and search-based offline RL) plus a theoretical link between search success and information extraction per budget unit, showing empirical gains in optimization-like reasoning.
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , booktitle =
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.
Improved upper bound α_3 ≤ 0.2953 for Witsenhausen's problem in dimension 3 via harmonic analysis, geometric fractional chromatic number, and a computer-searched 33-point set.
A two-stage OMR pipeline decodes symbol candidates into polyphonic score structures via topology recognition with probability-guided search.
citing papers explorer
-
Step-by-Step Optimization-like Reasoning in LLMs over Expanding Search Spaces
Introduces OPT* tasks and two training regimes (solver-guided online policy optimization with rank-based reward shaping and search-based offline RL) plus a theoretical link between search success and information extraction per budget unit, showing empirical gains in optimization-like reasoning.
-
Language Models as Knowledge Bases?
BERT stores relational knowledge extractable via cloze queries without fine-tuning and matches supervised baselines on open-domain QA tasks.
-
Improved bounds for the double cap conjecture
Improved upper bound α_3 ≤ 0.2953 for Witsenhausen's problem in dimension 3 via harmonic analysis, geometric fractional chromatic number, and a computer-searched 33-point set.
-
From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR
A two-stage OMR pipeline decodes symbol candidates into polyphonic score structures via topology recognition with probability-guided search.