Title resolution pending

Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik R Narasimhan · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

PlayCoder: Making LLM-Generated GUI Code Playable

cs.SE · 2026-04-21 · conditional · novelty 7.0

PlayCoder raises the rate of LLM-generated GUI apps that can be played end-to-end without logic errors from near zero to 20.3% Play@3 by adding repository-aware generation, agent-driven testing, and iterative repair.

When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling

cs.SE · 2026-01-21 · unverdicted · novelty 7.0

A large-scale empirical study categorizes bugs in LLM agents and demonstrates that a specialized LLM agent can annotate them accurately at very low cost.

SWE-QA: Can Language Models Answer Repository-level Code Questions?

cs.CL · 2025-09-18 · unverdicted · novelty 7.0

SWE-QA creates a new repository-level code QA benchmark with 576 pairs and an agentic LLM framework, showing promise but open challenges for models handling complex codebases.

citing papers explorer

Showing 3 of 3 citing papers.

PlayCoder: Making LLM-Generated GUI Code Playable cs.SE · 2026-04-21 · conditional · none · ref 28
PlayCoder raises the rate of LLM-generated GUI apps that can be played end-to-end without logic errors from near zero to 20.3% Play@3 by adding repository-aware generation, agent-driven testing, and iterative repair.
When Agents Fail: A Comprehensive Study of Bugs in LLM Agents with Automated Labeling cs.SE · 2026-01-21 · unverdicted · none · ref 43
A large-scale empirical study categorizes bugs in LLM agents and demonstrates that a specialized LLM agent can annotate them accurately at very low cost.
SWE-QA: Can Language Models Answer Repository-level Code Questions? cs.CL · 2025-09-18 · unverdicted · none · ref 15
SWE-QA creates a new repository-level code QA benchmark with 576 pairs and an agentic LLM framework, showing promise but open challenges for models handling complex codebases.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer