REAP automatically curates production-derived benchmarks for AI coding agents via LLM classification and stability checks, producing the Harvest benchmark with model solve rates of 42.9-58.2%.
InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24)
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7verdicts
UNVERDICTED 7roles
background 3polarities
background 3representative citing papers
Developers using AI assistants exhibit more stable emotions and greater focus on code creation, evaluation, and verification, captured in a new four-dimensional S-IASE model from retrospective labeling of screen recordings, surveys, and interviews.
Babbling Suppression stops LLM code generation upon test passage to reduce token output and energy consumption by up to 65% across Python and Java benchmarks.
Longitudinal surveys show AI coding assistants reduce time on code writing but increase supervisory verification tasks, with stable productivity perceptions yet rising reports of worsened developer experience.
Among novice programmers using AI code generators, trust did not predict compliance with suggestions, while performance correlated with both compliance and increased subsequent trust.
A six-month qualitative study of a mixed-ability nonprofit finds that conflicting access needs in communication act as a generative process revealing power structures and enabling accountability and repair rather than serving as technical problems to eliminate.
A qualitative study of mixed-ability teams identifies four types of interrelated failures and workarounds in information representation use, influenced by stigmas and social dynamics.
citing papers explorer
-
REAP: Automatic Curation of Coding Agent Benchmarks from Interactive Production Usage
REAP automatically curates production-derived benchmarks for AI coding agents via LLM classification and stability checks, producing the Harvest benchmark with model solve rates of 42.9-58.2%.
-
How Do Developers Interact with AI? An Exploratory Study on Modeling Developer Programming Behavior
Developers using AI assistants exhibit more stable emotions and greater focus on code creation, evaluation, and verification, captured in a new four-dimensional S-IASE model from retrospective labeling of screen recordings, surveys, and interviews.
-
Babbling Suppression: Making LLMs Greener One Token at a Time
Babbling Suppression stops LLM code generation upon test passage to reduce token output and energy consumption by up to 65% across Python and Java benchmarks.
-
The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study
Longitudinal surveys show AI coding assistants reduce time on code writing but increase supervisory verification tasks, with stable productivity perceptions yet rising reports of worsened developer experience.
-
Relationships Between Trust, Compliance, and Performance for Novice Programmers Using AI Code Generation
Among novice programmers using AI code generators, trust did not predict compliance with suggestions, while performance correlated with both compliance and increased subsequent trust.
-
Designing for Collective Access: In Search of a Solution to Accessible Communication in a Mixed-Ability Non-Profit
A six-month qualitative study of a mixed-ability nonprofit finds that conflicting access needs in communication act as a generative process revealing power structures and enabling accountability and repair rather than serving as technical problems to eliminate.
-
"If We Had the Information That We Need to Interpret the World Around Us, We Wouldn't Be Disabled:" Barriers and Opportunities in Information Work among Blind and Sighted Colleagues
A qualitative study of mixed-ability teams identifies four types of interrelated failures and workarounds in information representation use, influenced by stigmas and social dynamics.