CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.
Canonical reference
Beyond code generation: An observational study of chatgpt usage in software engineering practice.Proc
Canonical reference. 83% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
representative citing papers
EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.
The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
Exploratory lab study finds shared LLM use builds shared understanding in design teams while parallel use risks context drift, with professionals reflecting on outputs for insights but sometimes anchoring early.
The paper describes ongoing efforts to characterize developer diversity in cognition and context and to use personalization to make LLM-based conversational programming assistants more inclusive.
citing papers explorer
-
CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge
CyberCertBench shows frontier LLMs reach human-expert performance on general IT and networking security but drop on vendor-specific and formal standards questions such as IEC 62443, with a new framework for producing interpretable explanations.
-
Choose Your Own Adventure: Non-Linear AI-Assisted Programming with EvoGraph
EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.
-
Guidelines for Empirical Studies in Software Engineering involving Large Language Models
The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.
-
What Software Engineering Looks Like to AI Agents? -- An Empirical Study of AI-Only Technical Discourse on MoltBook
Empirical analysis of 4707 MoltBook posts shows AI-only technical discourse focuses on security, trust, and abstract topics while lacking concrete runtime and project details found in human GitHub discussions.
-
The Role of LLMs in Collaborative Software Design
Exploratory lab study finds shared LLM use builds shared understanding in design teams while parallel use risks context drift, with professionals reflecting on outputs for insights but sometimes anchoring early.
-
Personalizing LLM-Based Conversational Programming Assistants
The paper describes ongoing efforts to characterize developer diversity in cognition and context and to use personalization to make LLM-based conversational programming assistants more inclusive.