REAP automatically curates production-derived benchmarks for AI coding agents via LLM classification and stability checks, producing the Harvest benchmark with model solve rates of 42.9-58.2%.
InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24)
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 12verdicts
UNVERDICTED 12roles
background 3polarities
background 3representative citing papers
Developers using AI assistants exhibit more stable emotions and greater focus on code creation, evaluation, and verification, captured in a new four-dimensional S-IASE model from retrospective labeling of screen recordings, surveys, and interviews.
Mixed-methods study shows developers prefer GenAI for repetitive tasks, benefit from single interaction modes but not combined ones, and gain awareness from study participation.
Incidental prompt cues induce large, systematic shifts in the algorithm families chosen by LLMs during code generation across thousands of controlled trials.
Survey of 162 vibe coders finds perceptions of AI code quality similar across experience levels but motivations, interaction styles, and quality assurance practices diverge, revealing a perception-action gap.
Babbling Suppression stops LLM code generation upon test passage to reduce token output and energy consumption by up to 65% across Python and Java benchmarks.
Longitudinal surveys show AI coding assistants reduce time on code writing but increase supervisory verification tasks, with stable productivity perceptions yet rising reports of worsened developer experience.
A multisite biometric study finds lower cognitive engagement under AI assistance via EEG and blink rate, with physiological-performance links present only in the non-AI condition.
Among novice programmers using AI code generators, trust did not predict compliance with suggestions, while performance correlated with both compliance and increased subsequent trust.
The IIP model is a cybernetic framework representing humans and AI as coupled control loops whose efficacy depends on input adequacy, reference consonance, and output operativity to guide interface design.
A six-month qualitative study of a mixed-ability nonprofit finds that conflicting access needs in communication act as a generative process revealing power structures and enabling accountability and repair rather than serving as technical problems to eliminate.
A qualitative study of mixed-ability teams identifies four types of interrelated failures and workarounds in information representation use, influenced by stigmas and social dynamics.
citing papers explorer
-
REAP: Automatic Curation of Coding Agent Benchmarks from Interactive Production Usage
REAP automatically curates production-derived benchmarks for AI coding agents via LLM classification and stability checks, producing the Harvest benchmark with model solve rates of 42.9-58.2%.
-
How Do Developers Interact with AI? An Exploratory Study on Modeling Developer Programming Behavior
Developers using AI assistants exhibit more stable emotions and greater focus on code creation, evaluation, and verification, captured in a new four-dimensional S-IASE model from retrospective labeling of screen recordings, surveys, and interviews.
-
Developers' Experience with Generative AI Beyond Productivity Assessment -- Insights from an Empirical Mixed-Methods Field Study
Mixed-methods study shows developers prefer GenAI for repetitive tasks, benefit from single interaction modes but not combined ones, and gain awareness from study participation.
-
The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation
Incidental prompt cues induce large, systematic shifts in the algorithm families chosen by LLMs during code generation across thousands of controlled trials.
-
From Prompting to Verification: How Experience Shapes Vibe Coding Practices
Survey of 162 vibe coders finds perceptions of AI code quality similar across experience levels but motivations, interaction styles, and quality assurance practices diverge, revealing a perception-action gap.
-
Babbling Suppression: Making LLMs Greener One Token at a Time
Babbling Suppression stops LLM code generation upon test passage to reduce token output and energy consumption by up to 65% across Python and Java benchmarks.
-
The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study
Longitudinal surveys show AI coding assistants reduce time on code writing but increase supervisory verification tasks, with stable productivity perceptions yet rising reports of worsened developer experience.
-
Using Biometrics to Understand AI-Assisted Coding Performance and its Perception
A multisite biometric study finds lower cognitive engagement under AI assistance via EEG and blink rate, with physiological-performance links present only in the non-AI condition.
-
Relationships Between Trust, Compliance, and Performance for Novice Programmers Using AI Code Generation
Among novice programmers using AI code generators, trust did not predict compliance with suggestions, while performance correlated with both compliance and increased subsequent trust.
-
A Model of Integrated Information Processing in Human-AI Interaction
The IIP model is a cybernetic framework representing humans and AI as coupled control loops whose efficacy depends on input adequacy, reference consonance, and output operativity to guide interface design.
-
Designing for Collective Access: In Search of a Solution to Accessible Communication in a Mixed-Ability Non-Profit
A six-month qualitative study of a mixed-ability nonprofit finds that conflicting access needs in communication act as a generative process revealing power structures and enabling accountability and repair rather than serving as technical problems to eliminate.
-
"If We Had the Information That We Need to Interpret the World Around Us, We Wouldn't Be Disabled:" Barriers and Opportunities in Information Work among Blind and Sighted Colleagues
A qualitative study of mixed-ability teams identifies four types of interrelated failures and workarounds in information representation use, influenced by stigmas and social dynamics.