arXiv preprint arXiv:2504.05518 , year=

Rem Yang, Julian Dai, Nikos Vasilakis, Martin Rinard · 2025 · arXiv 2504.05518

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings

cs.SE · 2025-12-16 · unverdicted · novelty 7.0

A new dataset and nine-metric majority-vote procedure show that existing code-reasoning benchmarks are dominated by lower-complexity problems that do not reflect real-world code.

Diagnosing CFG Interpretation in LLMs

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

LLMs maintain surface syntax for novel CFGs but fail to preserve semantics under recursion and branching, relying on keyword bootstrapping rather than pure symbolic reasoning.

citing papers explorer

Showing 2 of 2 citing papers.

Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings cs.SE · 2025-12-16 · unverdicted · none · ref 69
A new dataset and nine-metric majority-vote procedure show that existing code-reasoning benchmarks are dominated by lower-complexity problems that do not reflect real-world code.
Diagnosing CFG Interpretation in LLMs cs.AI · 2026-04-22 · unverdicted · none · ref 31
LLMs maintain surface syntax for novel CFGs but fail to preserve semantics under recursion and branching, relying on keyword bootstrapping rather than pure symbolic reasoning.

arXiv preprint arXiv:2504.05518 , year=

fields

years

verdicts

representative citing papers

citing papers explorer