pith. sign in

The SWE-Bench Illusion.arXiv:2506.12286, June

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

citation-role summary

other 1

citation-polarity summary

fields

cs.SE 6 cs.AI 1

years

2026 6 2025 1

roles

other 1

polarities

unclear 1

representative citing papers

Evaluating Plan Compliance in Autonomous Programming Agents

cs.SE · 2026-04-13 · unverdicted · novelty 7.0

Autonomous programming agents frequently fail to follow instructed plans, falling back on incomplete internalized workflows, while standard plans and periodic reminders improve performance but poor plans can degrade it more than no plan.

Reproduction Test Generation for Java SWE Issues

cs.SE · 2026-05-05 · unverdicted · novelty 6.0 · 2 refs

Introduces the first benchmark for Java reproduction test generation from repository issues and adapts a prior Python tool to produce high performance on it.

Diagnosing CFG Interpretation in LLMs

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

LLMs maintain surface syntax for novel CFGs but fail to preserve semantics under recursion and branching, relying on keyword bootstrapping rather than pure symbolic reasoning.

citing papers explorer

Showing 7 of 7 citing papers.