Canonical reference

Hellendoorn, Bogdan Vasilescu, and Brad A

Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, Brad Myers · 2024 · arXiv 7503.36391

Canonical reference. 100% of citing Pith papers cite this work as background.

6 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 6 citing papers

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar Code

cs.SE · 2026-05-11 · unverdicted · novelty 7.0

Developers using AI showed the same core problem-solving behaviors as those without but differed in how they became stuck and recovered, with AI helping or hindering in specific cases.

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.

Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph

cs.HC · 2026-04-20 · unverdicted · novelty 7.0

EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.

Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software

cs.SE · 2025-10-17 · unverdicted · novelty 7.0

LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.

What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.

Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

Raven automates Scratch program assessment by having instructors specify task-level video generation rules and using LLMs to analyze resulting videos for behavioral compliance, outperforming prior tools on real student submissions.

citing papers explorer

Showing 6 of 6 citing papers.

ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar Code cs.SE · 2026-05-11 · unverdicted · none · ref 34
Developers using AI showed the same core problem-solving behaviors as those without but differed in how they became stuck and recovered, with AI helping or hindering in specific cases.
Synthesizing Multi-Agent Harnesses for Vulnerability Discovery cs.CR · 2026-04-22 · unverdicted · none · ref 41
AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.
Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph cs.HC · 2026-04-20 · unverdicted · none · ref 40
EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.
Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software cs.SE · 2025-10-17 · unverdicted · none · ref 27
LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook cs.SE · 2026-05-08 · unverdicted · none · ref 11
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation cs.SE · 2026-04-20 · unverdicted · none · ref 36
Raven automates Scratch program assessment by having instructors specify task-level video generation rules and using LLMs to analyze resulting videos for behavioral compliance, outperforming prior tools on real student submissions.

Hellendoorn, Bogdan Vasilescu, and Brad A

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer