Not all llm reasoners are created equal.arXiv preprint arXiv:2410.01748

Arian Hosseini, Alessandro Sordoni, Daniel Toyama, Aaron Courville, Rishabh Agarwal · 2024 · arXiv 2410.01748

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 unclear 1

representative citing papers

LEAD: Breaking the No-Recovery Bottleneck in Long-Horizon Reasoning

cs.AI · 2026-03-06 · unverdicted · novelty 7.0

LEAD lets LLMs solve checkers jumping puzzles up to size 13 by using lookahead to recover from irreversible errors on hard steps that break extreme decomposition.

Agentic Frameworks for Reasoning Tasks: An Empirical Study

cs.AI · 2026-04-17 · unverdicted · novelty 6.0

An empirical evaluation of 22 agentic frameworks on BBH, GSM8K, and ARC benchmarks shows stable performance in 12 frameworks but highlights orchestration failures and weaker mathematical reasoning.

How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence

cs.CL · 2026-03-18 · unverdicted · novelty 5.0

AI's compositional reasoning failures originate in psychological learning paradigms that shaped its architectures, and the ReSynth trimodular framework is proposed to embed systematicity structurally.

Humanity's Last Exam

cs.LG · 2025-01-24 · unverdicted · novelty 5.0

Humanity's Last Exam is a new 2,500-question benchmark at the frontier of human knowledge where state-of-the-art LLMs show low accuracy.

citing papers explorer

Showing 4 of 4 citing papers.

LEAD: Breaking the No-Recovery Bottleneck in Long-Horizon Reasoning cs.AI · 2026-03-06 · unverdicted · none · ref 2
LEAD lets LLMs solve checkers jumping puzzles up to size 13 by using lookahead to recover from irreversible errors on hard steps that break extreme decomposition.
Agentic Frameworks for Reasoning Tasks: An Empirical Study cs.AI · 2026-04-17 · unverdicted · none · ref 62
An empirical evaluation of 22 agentic frameworks on BBH, GSM8K, and ARC benchmarks shows stable performance in 12 frameworks but highlights orchestration failures and weaker mathematical reasoning.
How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence cs.CL · 2026-03-18 · unverdicted · none · ref 35
AI's compositional reasoning failures originate in psychological learning paradigms that shaped its architectures, and the ReSynth trimodular framework is proposed to embed systematicity structurally.
Humanity's Last Exam cs.LG · 2025-01-24 · unverdicted · none · ref 26
Humanity's Last Exam is a new 2,500-question benchmark at the frontier of human knowledge where state-of-the-art LLMs show low accuracy.

Not all llm reasoners are created equal.arXiv preprint arXiv:2410.01748

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer