hub Canonical reference

Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

Priyan Vaithilingam, Tianyi Zhang, Elena L · 2022 · arXiv 1101.351966

Canonical reference. 88% of citing Pith papers cite this work as background.

17 Pith papers citing it

Background 88% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8

citation-polarity summary

background 7 unclear 1

representative citing papers

ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar Code

cs.SE · 2026-05-11 · unverdicted · novelty 7.0

Developers using AI showed the same core problem-solving behaviors as those without but differed in how they became stuck and recovered, with AI helping or hindering in specific cases.

CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

cs.CR · 2026-04-22 · unverdicted · novelty 7.0

CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.

Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph

cs.HC · 2026-04-20 · unverdicted · novelty 7.0

EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.

REAP: Automatic Curation of Coding Agent Benchmarks from Interactive Production Usage

cs.SE · 2026-04-02 · unverdicted · novelty 7.0

REAP automatically curates production-derived benchmarks for AI coding agents via LLM classification and stability checks, producing the Harvest benchmark with model solve rates of 42.9-58.2%.

Journeys of Parents with LGBTQ+ Children: How Trauma and Healing Reshape Identity and (Mis)Informating Practices

cs.HC · 2026-05-19 · unverdicted · novelty 6.0

A qualitative study of South Korean parents shows that trauma and healing after learning a child is LGBTQ+ leads to identity reconstruction as supportive parents and more critical, protective informating practices.

What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook

cs.SE · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.

Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.

Designing Around Stigma: Human-Centered LLMs for Menstrual Health

cs.HC · 2026-04-07 · unverdicted · novelty 6.0

Researchers created a stigma-aware WhatsApp chatbot for menstrual health education in Pakistan through co-design workshops and a two-week deployment, yielding insights on its use for challenging taboos alongside tensions around trust and cultural explanations.

Decision-Oriented Programming with Aporia

cs.HC · 2026-04-06 · conditional · novelty 6.0

Aporia makes design decisions explicit and interactive in AI-assisted programming, leading to higher engagement and 5x fewer mental model disagreements with code in a 14-person user study compared to a baseline agent.

Polite But Boring? Trade-offs Between Engagement and Psychological Reactance to Chatbot Feedback Styles

cs.HC · 2026-01-28 · unverdicted · novelty 6.0

Polite chatbot feedback lowers psychological reactance and boosts behavioral intentions but lacks engagement, whereas verbal leakage heightens surprise and engagement at the expense of increased reactance.

PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes

cs.SE · 2025-05-12 · conditional · novelty 6.0

Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.

The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study

cs.SE · 2026-05-22 · unverdicted · novelty 5.0

Longitudinal surveys show AI coding assistants reduce time on code writing but increase supervisory verification tasks, with stable productivity perceptions yet rising reports of worsened developer experience.

Relationships Between Trust, Compliance, and Performance for Novice Programmers Using AI Code Generation

cs.HC · 2026-04-21 · unverdicted · novelty 5.0

Among novice programmers using AI code generators, trust did not predict compliance with suggestions, while performance correlated with both compliance and increased subsequent trust.

"If You're Very Clever, No One Knows You've Used It": The Social Dynamics of Developing Generative AI Literacy in the Workplace

cs.HC · 2026-02-01 · accept · novelty 5.0

Hiding generative AI use to signal expertise reduces knowledge sharing and transparency among workplace colleagues.

"Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering

cs.SE · 2024-11-15 · conditional · novelty 5.0

User study reveals nine LLM failure categories in SE tasks and quantifies abandonment factors from 26 participants.

Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models

cs.SE · 2024-11-16 · unverdicted · novelty 3.0

Smaller LLMs produce functional but limited Python code with variable quantization effects and quality/maintainability concerns that require validation before use.

Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks

cs.SE · 2024-10-01 · unverdicted · novelty 3.0

A survey of user studies on LLM use in programming that identifies interaction behaviors, mixed benefits and weaknesses, and factors influencing human and task performance.

citing papers explorer

Showing 17 of 17 citing papers.

ChatGPT: Friend or Foe When Comprehending and Changing Unfamiliar Code cs.SE · 2026-05-11 · unverdicted · none · ref 47
Developers using AI showed the same core problem-solving behaviors as those without but differed in how they became stuck and recovered, with AI helping or hindering in specific cases.
CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge cs.CR · 2026-04-22 · unverdicted · none · ref 1
CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.
Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph cs.HC · 2026-04-20 · unverdicted · none · ref 51
EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.
REAP: Automatic Curation of Coding Agent Benchmarks from Interactive Production Usage cs.SE · 2026-04-02 · unverdicted · none · ref 13
REAP automatically curates production-derived benchmarks for AI coding agents via LLM classification and stability checks, producing the Harvest benchmark with model solve rates of 42.9-58.2%.
Journeys of Parents with LGBTQ+ Children: How Trauma and Healing Reshape Identity and (Mis)Informating Practices cs.HC · 2026-05-19 · unverdicted · none · ref 98
A qualitative study of South Korean parents shows that trauma and healing after learning a child is LGBTQ+ leads to identity reconstruction as supportive parents and more critical, protective informating practices.
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook cs.SE · 2026-05-08 · unverdicted · none · ref 13 · 2 links
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation cs.SE · 2026-04-20 · unverdicted · none · ref 33
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
Designing Around Stigma: Human-Centered LLMs for Menstrual Health cs.HC · 2026-04-07 · unverdicted · none · ref 121
Researchers created a stigma-aware WhatsApp chatbot for menstrual health education in Pakistan through co-design workshops and a two-week deployment, yielding insights on its use for challenging taboos alongside tensions around trust and cultural explanations.
Decision-Oriented Programming with Aporia cs.HC · 2026-04-06 · conditional · none · ref 54
Aporia makes design decisions explicit and interactive in AI-assisted programming, leading to higher engagement and 5x fewer mental model disagreements with code in a 14-person user study compared to a baseline agent.
Polite But Boring? Trade-offs Between Engagement and Psychological Reactance to Chatbot Feedback Styles cs.HC · 2026-01-28 · unverdicted · none · ref 21
Polite chatbot feedback lowers psychological reactance and boosts behavioral intentions but lacks engagement, whereas verbal leakage heightens surprise and engagement at the expense of increased reactance.
PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes cs.SE · 2025-05-12 · conditional · none · ref 94
Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.
The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study cs.SE · 2026-05-22 · unverdicted · none · ref 47
Longitudinal surveys show AI coding assistants reduce time on code writing but increase supervisory verification tasks, with stable productivity perceptions yet rising reports of worsened developer experience.
Relationships Between Trust, Compliance, and Performance for Novice Programmers Using AI Code Generation cs.HC · 2026-04-21 · unverdicted · none · ref 13
Among novice programmers using AI code generators, trust did not predict compliance with suggestions, while performance correlated with both compliance and increased subsequent trust.
"If You're Very Clever, No One Knows You've Used It": The Social Dynamics of Developing Generative AI Literacy in the Workplace cs.HC · 2026-02-01 · accept · none · ref 125
Hiding generative AI use to signal expertise reduces knowledge sharing and transparency among workplace colleagues.
"Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering cs.SE · 2024-11-15 · conditional · none · ref 16
User study reveals nine LLM failure categories in SE tasks and quantifies abandonment factors from 26 participants.
Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models cs.SE · 2024-11-16 · unverdicted · none · ref 18
Smaller LLMs produce functional but limited Python code with variable quantization effects and quality/maintainability concerns that require validation before use.
Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks cs.SE · 2024-10-01 · unverdicted · none · ref 95
A survey of user studies on LLM use in programming that identifies interaction behaviors, mixed benefits and weaknesses, and factors influencing human and task performance.

Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer