Title resolution pending

Maliheh Izadi, Jonathan Katzy, Tim Van Dam, Marc Otten, Razvan Mihai Popescu, Arie Van Deursen · 2024

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

LLM agents reach only 35% average checkpoint completion on ten realistic CTF challenges in a new open benchmark with automated partial-credit scoring.

Agentic Much? Adoption of Coding Agents on GitHub

cs.SE · 2026-01-26 · conditional · novelty 7.0

Coding agents reached 22-29% adoption in GitHub projects within months of release, with agent-assisted commits larger and focused on features and bug fixes.

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

cs.SE · 2026-05-06 · accept · novelty 6.0

A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.

RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices

cs.SE · 2026-04-24 · unverdicted · novelty 6.0

RealBench is a new repo-level code generation benchmark that adds UML diagrams to natural language specs, showing LLMs struggle more at full repositories, create modules with errors, and perform best with whole-repo generation on small projects versus module-by-module on complex ones.

citing papers explorer

Showing 4 of 4 citing papers.

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges cs.AI · 2026-04-21 · unverdicted · none · ref 11
LLM agents reach only 35% average checkpoint completion on ten realistic CTF challenges in a new open benchmark with automated partial-credit scoring.
Agentic Much? Adoption of Coding Agents on GitHub cs.SE · 2026-01-26 · conditional · none · ref 18
Coding agents reached 22-29% adoption in GitHub projects within months of release, with agent-assisted commits larger and focused on features and bug fixes.
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code cs.SE · 2026-05-06 · accept · none · ref 47
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices cs.SE · 2026-04-24 · unverdicted · none · ref 22
RealBench is a new repo-level code generation benchmark that adds UML diagrams to natural language specs, showing LLMs struggle more at full repositories, create modules with errors, and perform best with whole-repo generation on small projects versus module-by-module on complex ones.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer