Mohammad Javad Hosseini, Hannaneh Hajishirzi, Oren Etzioni, and Nate Kushman

Pengfei Hong, Deepanway Ghosal, Navonil Majumder, Somak Aditya, Rada Mihalcea, Soujanya Poria · 2024 · arXiv 2401.09395

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models

cs.LG · 2024-10-07 · accept · novelty 7.0

LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.

Hallucination is Inevitable: An Innate Limitation of Large Language Models

cs.CL · 2024-01-22 · conditional · novelty 7.0

Hallucinations are inevitable in LLMs because they cannot learn all computable functions according to learning theory.

Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms

cs.CL · 2026-04-17 · unverdicted · novelty 6.0

Proficient LLMs detect arithmetic tasks early but output correct answers only in final layers, with attention and MLP modules dividing labor in a way absent from less proficient models.

citing papers explorer

Showing 3 of 3 citing papers.

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models cs.LG · 2024-10-07 · accept · none · ref 72
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
Hallucination is Inevitable: An Innate Limitation of Large Language Models cs.CL · 2024-01-22 · conditional · none · ref 25
Hallucinations are inevitable in LLMs because they cannot learn all computable functions according to learning theory.
Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms cs.CL · 2026-04-17 · unverdicted · none · ref 1
Proficient LLMs detect arithmetic tasks early but output correct answers only in final layers, with attention and MLP modules dividing labor in a way absent from less proficient models.

Mohammad Javad Hosseini, Hannaneh Hajishirzi, Oren Etzioni, and Nate Kushman

fields

years

verdicts

representative citing papers

citing papers explorer