Developers using AI showed the same core problem-solving behaviors as those without but differed in how they became stuck and recovered, with AI helping or hindering in specific cases.
Canonical reference
Hellendoorn, Bogdan Vasilescu, and Brad A
Canonical reference. 100% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 5polarities
background 5representative citing papers
AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.
EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.
LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
Raven automates Scratch program assessment by having instructors specify task-level video generation rules and using LLMs to analyze resulting videos for behavioral compliance, outperforming prior tools on real student submissions.
citing papers explorer
-
ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar Code
Developers using AI showed the same core problem-solving behaviors as those without but differed in how they became stuck and recovered, with AI helping or hindering in specific cases.
-
Synthesizing Multi-Agent Harnesses for Vulnerability Discovery
AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new zero-days in Chrome including two critical sandbox escapes.
-
Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph
EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.
-
Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software
LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.
-
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
-
Raven: Rethinking Automated Assessment for Scratch Programs via Video-Grounded Evaluation
Raven automates Scratch program assessment by having instructors specify task-level video generation rules and using LLMs to analyze resulting videos for behavioral compliance, outperforming prior tools on real student submissions.