LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.
Toolchain*: Efficient action space navigation in large language models with a* search
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.
FitText embeds memetic evolutionary retrieval inside the agent's reasoning loop to iteratively refine pseudo-tool descriptions, raising retrieval rank from 8.81 to 2.78 on ToolRet and pass rate to 0.73 on StableToolBench.
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
NaviAgent decouples task planning from tool execution via a Tool World Navigation Model graph to improve scalability and success rates in LLM agents handling large tool ecosystems.
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.
citing papers explorer
-
Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers
LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.
-
Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery
LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.
-
FitText: Evolving Agent Tool Ecologies via Memetic Retrieval
FitText embeds memetic evolutionary retrieval inside the agent's reasoning loop to iteratively refine pseudo-tool descriptions, raising retrieval rank from 8.81 to 2.78 on ToolRet and pass rate to 0.73 on StableToolBench.
-
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
Math reasoning gains in LLMs rarely transfer to general domains; RL tuning generalizes while SFT causes forgetting and representation drift.
-
NaviAgent: Bilevel Planning on Tool Navigation Graph for Large-Scale Orchestration
NaviAgent decouples task planning from tool execution via a Tool World Navigation Model graph to improve scalability and success rates in LLM agents handling large tool ecosystems.
-
A Survey on LLM-as-a-Judge
A survey on LLM-as-a-Judge that reviews reliability strategies, proposes evaluation methods, and introduces a novel benchmark for assessing such systems.