Title resolution pending

Rob Kopel · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Evaluating LLMs Code Reasoning Under Real-World Context

cs.SE · 2026-04-14 · unverdicted · novelty 7.0

R2Eval is a new benchmark with 135 real-world code reasoning problems from Python projects that preserves complex data structures for more realistic LLM evaluation.

Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings

cs.SE · 2025-12-16 · unverdicted · novelty 7.0

A new dataset and nine-metric majority-vote procedure show that existing code-reasoning benchmarks are dominated by lower-complexity problems that do not reflect real-world code.

citing papers explorer

Showing 2 of 2 citing papers.

Evaluating LLMs Code Reasoning Under Real-World Context cs.SE · 2026-04-14 · unverdicted · none · ref 14
R2Eval is a new benchmark with 135 real-world code reasoning problems from Python projects that preserves complex data structures for more realistic LLM evaluation.
Evaluating Code Reasoning Abilities of Large Language Models Under Real-World Settings cs.SE · 2025-12-16 · unverdicted · none · ref 30
A new dataset and nine-metric majority-vote procedure show that existing code-reasoning benchmarks are dominated by lower-complexity problems that do not reflect real-world code.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer