LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.
NPH ard E val: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
MixRea benchmark reveals LLMs achieve at most 42.8% consistency on explicit-implicit reasoning tasks, with PRCP prompting proposed to recover overlooked relations.
citing papers explorer
-
Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers
LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.
-
MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models
MixRea benchmark reveals LLMs achieve at most 42.8% consistency on explicit-implicit reasoning tasks, with PRCP prompting proposed to recover overlooked relations.