The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

Eirini Kalliamvakou; Mert Demirer; Peter Cihon; Sida Peng

arxiv: 2302.06590 · v1 · submitted 2023-02-13 · 💻 cs.SE

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

Sida Peng , Eirini Kalliamvakou , Peter Cihon , Mert Demirer This is my paper

Pith reviewed 2026-05-24 09:06 UTC · model grok-4.3

classification 💻 cs.SE

keywords GitHub CopilotAI pair programmerdeveloper productivitycontrolled experimenttask completion timegenerative AIsoftware developmentHTTP server implementation

0 comments

The pith

Developers with access to GitHub Copilot completed a coding task 55.8 percent faster than those without it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports a controlled experiment in which recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible. The group given access to the AI pair programmer finished the task 55.8 percent faster on average than the control group. Observed differences across participants suggest the tool may be especially helpful for people moving into software development roles. A sympathetic reader would care because the result supplies direct evidence on whether generative AI tools deliver measurable productivity gains in a realistic coding setting.

Core claim

In a controlled experiment, software developers with access to GitHub Copilot implemented an HTTP server in JavaScript 55.8 percent faster than the control group without access. Heterogeneous effects indicate that the tool shows promise for helping people transition into software development careers.

What carries the argument

The controlled experiment measuring task completion time for implementing an HTTP server in JavaScript, with random assignment to treatment (AI access) or control.

If this is right

AI pair programmers can produce large reductions in time to complete certain coding tasks.
The productivity benefit appears larger for some participants, pointing to value in supporting career entry.
Generative AI tools can increase measured output in software development settings.
Access to the tool did not eliminate the need for developer skill but accelerated progress on the assigned task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the speedup holds across a wider range of real-world projects, organizations might adjust hiring or training timelines.
The result leaves open whether similar gains appear in collaborative or long-running codebases rather than isolated tasks.
Productivity metrics based on single-task speed may understate or overstate effects once code quality and maintenance are included.

Load-bearing premise

The chosen task of implementing an HTTP server in JavaScript and the experimental controls are representative enough that the measured speed difference reflects real productivity gains from the AI tool.

What would settle it

A replication using a different coding task, such as implementing a web application feature or working in another language, that finds no significant time reduction or a reduction below 20 percent would undermine the central claim.

read the original abstract

Generative AI tools hold promise to increase human productivity. This paper presents results from a controlled experiment with GitHub Copilot, an AI pair programmer. Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible. The treatment group, with access to the AI pair programmer, completed the task 55.8% faster than the control group. Observed heterogenous effects show promise for AI pair programmers to help people transition into software development careers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Controlled experiment shows 55.8% faster task completion with Copilot, though the single narrow task raises external validity issues.

read the letter

The main thing to know about this paper is that it reports a controlled experiment where the group with GitHub Copilot access completed an HTTP server in JavaScript 55.8% faster than the control group. They also look at some heterogeneous effects related to career transitions. What is new is the specific experimental result for Copilot on this task. It moves beyond general discussion of AI tools to a measured difference in completion time. The paper does well by using a treatment-control design, which allows for a direct comparison rather than observational data. The soft spots are around generalizability. The task is a self-contained coding exercise in a controlled setting, which may not reflect productivity in typical development environments with existing code, debugging, or team coordination. The abstract provides no information on sample size, randomization procedure, or statistical significance, making it difficult to assess the reliability of the 55.8% figure. The link to helping people enter software development careers appears to be an extension beyond the main result. This paper is for researchers interested in empirical studies of AI coding assistants. A reader focused on early evidence for tool adoption could find the number useful as a benchmark for similar tasks. It deserves peer review to allow referees to evaluate the methods section and any robustness checks. I would recommend sending it for peer review.

Referee Report

2 major / 1 minor

Summary. The paper reports a controlled experiment in which recruited software developers implemented an HTTP server in JavaScript as quickly as possible. Developers with access to GitHub Copilot (treatment) completed the task 55.8% faster than the control group. The paper additionally reports heterogeneous effects and suggests implications for helping people transition into software development careers.

Significance. If the measured time difference is robust, the result supplies one of the first controlled, quantitative estimates of productivity gains from an AI pair programmer on a coding task. The experimental design itself is a strength for internal validity.

major comments (2)

[§3] §3 (Experimental Design): The central 55.8% speedup claim rests on a single greenfield task (implementing an HTTP server in JavaScript). No replication across other task types (e.g., debugging legacy code, requirements iteration, or non-coding activities) is described, so the result does not directly support the title's broader claim of impact on 'developer productivity'.
[§4] §4 (Results): The abstract states the 55.8% figure but supplies no sample size, randomization details, exclusion criteria, or statistical test results. These elements are required to evaluate whether the observed difference is statistically reliable and not driven by small-N artifacts or selection.

minor comments (1)

[Abstract] Abstract: 'heterogenous' is misspelled; should be 'heterogeneous'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and propose revisions where appropriate to improve clarity and precision.

read point-by-point responses

Referee: [§3] §3 (Experimental Design): The central 55.8% speedup claim rests on a single greenfield task (implementing an HTTP server in JavaScript). No replication across other task types (e.g., debugging legacy code, requirements iteration, or non-coding activities) is described, so the result does not directly support the title's broader claim of impact on 'developer productivity'.

Authors: We agree that the experiment examines a single, controlled task chosen to maximize internal validity. This task requires implementing core functionality, handling requests, and basic testing, which captures several elements of typical developer work. The title was intended to reflect the productivity implications suggested by the results and heterogeneous effects, but we acknowledge it may overstate generalizability. We will revise the title to specify the controlled experimental context and add an explicit limitations paragraph discussing the single-task design and the need for future work on other task types. revision: yes
Referee: [§4] §4 (Results): The abstract states the 55.8% figure but supplies no sample size, randomization details, exclusion criteria, or statistical test results. These elements are required to evaluate whether the observed difference is statistically reliable and not driven by small-N artifacts or selection.

Authors: The full manuscript reports these details in the experimental design and results sections. However, the referee is correct that the abstract omits them. We will revise the abstract to include the sample size and a statement on statistical significance while remaining within length constraints, and ensure the abstract directs readers to the methods for randomization and exclusion criteria. revision: yes

Circularity Check

0 steps flagged

Empirical experiment with direct measurement; no derivation chain present

full rationale

The paper reports results from a controlled experiment measuring task completion time for implementing an HTTP server in JavaScript. The central claim (55.8% faster for treatment group) is an observed empirical difference under the stated conditions, with no equations, fitted parameters, predictions, or first-principles derivations that could reduce to the inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes are invoked. This is a standard empirical study whose validity rests on experimental design and external validity concerns rather than any circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are invoked; the work is an empirical experiment whose central claim rests on the validity of the controlled comparison.

pith-pipeline@v0.9.0 · 5608 in / 957 out tokens · 29449 ms · 2026-05-24T09:06:30.273201+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The treated group completed the task 55.8% faster than the control group (95% CI 21-89%).
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We calculated two metrics as a measure of performance for each group: task success and task completion time.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Mise en Place for Agentic Coding: Deliberate Preparation as Context Engineering Methodology
cs.SE 2026-05 unverdicted novelty 7.0

The Mise en Place methodology uses contextual grounding, collaborative specification, and task decomposition to prepare AI agents for coding tasks, demonstrated in a hackathon where two hours of prep enabled rapid par...
Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%
cs.SE 2026-04 unverdicted novelty 7.0

Adding product context retrieval to AI coding agents raises decision compliance from 46% to 95% on a new benchmark of 8 tasks with 41 weighted decision points.
The software space of science
cs.DL 2026-04 unverdicted novelty 7.0

A network analysis of software mentions in 1.3 million papers identifies 520 tools in eight communities and shows disciplines maintain distinct, stable tool portfolios that are crystallizing toward common sets.
AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub
cs.SE 2026-04 accept novelty 7.0

AgenticFlict is a public dataset of 29K+ textual merge conflicts from AI agent PRs, collected via merge simulation on 107K processed PRs and showing a 27.67% conflict rate with variation across agents.
From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI
cs.SE 2026-03 unverdicted novelty 7.0

The paper introduces a Triple Debt Model with cognitive debt and intent debt alongside technical debt to address risks from generative AI in software development.
Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development
cs.SE 2026-03 unverdicted novelty 7.0

Vibe Code Bench evaluates AI models on building complete web applications from specs, with the best of 16 models achieving 61.8% accuracy on the test split using autonomous browser evaluation.
"Tab, Tab, Bug": Security Pitfalls of Next Edit Suggestions in AI-Integrated IDEs
cs.CR 2026-02 conditional novelty 7.0

NES systems in AI IDEs expand attack surfaces via context poisoning from imperceptible actions and global codebase retrieval, with professional developers largely unaware of the risks.
Agentic Much? Adoption of Coding Agents on GitHub
cs.SE 2026-01 conditional novelty 7.0

Coding agents reached 22-29% adoption in GitHub projects within months of release, with agent-assisted commits larger and focused on features and bug fixes.
Code Comprehension with GitHub Copilot: Performance Gains, Comprehension Trade-offs, and Behavioral Predictors in Brownfield Programming
cs.SE 2025-11 conditional novelty 7.0

Copilot boosts performance in brownfield tasks but decouples from comprehension unless users actively verify generated code, with verification frequency predicting understanding at r=0.96.
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
cs.MA 2025-06 accept novelty 7.0

A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
From Preventive to Reactive: How AI Coding Assistants Transform Developers' Security Awareness
cs.HC 2026-05 unverdicted novelty 6.0

Semi-structured interviews and task observations with 15 engineers show AI coding assistants reorganize security awareness from preventive to reactive, decoupling knowledge from prompt behavior and prompting informal ...
Why Are Agentic Pull Requests Merged or Rejected? An Empirical Study
cs.SE 2026-05 unverdicted novelty 6.0

Analysis of 9,799 human-reviewed agentic PRs shows only 35.7% of rejections reflect clear agent failures, with 31.2% due to workflow constraints and 33.1% lacking clear rationale, plus notable interaction differences ...
One Developer Is All You Need: A Case Study of an AI-Augmented One-Person Squad in a Brownfield Enterprise
cs.SE 2026-05 unverdicted novelty 6.0

Case study reports one staff engineer with four AI agents delivering a four-person-scoped brownfield project in half the planned time under Spec-Driven Development, with high code acceptance and major cost savings.
Multi-agent AI systems outperform human teams in creativity
cs.CL 2026-05 unverdicted novelty 6.0

Multi-agent LLM teams outperform human teams in creativity (d=1.50) across tasks by producing more novel ideas, with distinct semantic exploration patterns predicting success for each group.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs
cs.CR 2026-05 unverdicted novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top
econ.GN 2026-05 unverdicted novelty 6.0

Generative AI boosted solo entrepreneurial entry on Product Hunt after ChatGPT but teams still dominate the top quality tiers.
SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs
cs.SE 2026-05 unverdicted novelty 6.0

SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.
HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems
cs.AI 2026-05 unverdicted novelty 6.0

HAAS combines governance rules with contextual bandits to adaptively allocate tasks across a five-mode autonomy spectrum, showing that moderate governance improves manufacturing outcomes and that no single setting dominates.
Upskilling with Generative AI: Practices and Challenges for Freelance Knowledge Workers
cs.HC 2026-04 unverdicted novelty 6.0

Freelancers use generative AI to support exploratory skill acquisition but not as their main resource due to reliability issues, leading to a shift toward survival-oriented upskilling and the emergence of invisible co...
Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis
cs.SE 2026-04 conditional novelty 6.0

SpecValidator detects lexical vagueness, under-specification, and syntax-formatting defects in LLM code-generation prompts with F1 0.804, outperforming GPT-5-mini and Claude Sonnet 4, and shows that under-specificatio...
Generative artificial intelligence reduces social welfare through model collapse
physics.soc-ph 2026-04 unverdicted novelty 6.0

A game-theoretic model shows that individually rational adoption of generative AI causes model collapse that reduces collective social welfare for important tasks, with habit formation creating spillovers from low-sta...
BONSAI: A Mixed-Initiative Workspace for Human-AI Co-Development of Visual Analytics Applications
cs.HC 2026-04 unverdicted novelty 6.0

BONSAI introduces a four-layer architecture and four-phase workflow for human-AI co-development of visual analytics applications, shown in case studies to enable efficient novel tool creation and reconstruction from p...
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
cs.SE 2026-04 unverdicted novelty 6.0

Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
When LLMs Lag Behind: Knowledge Conflicts from Evolving APIs in Code Generation
cs.SE 2026-04 unverdicted novelty 6.0

LLMs produce executable code only 42.55% of the time under API evolution without full documentation, improving to 66.36% with structured docs and by 11% more with reasoning strategies, yet outdated patterns persist.
REAgent: Requirement-Driven LLM Agents for Software Issue Resolution
cs.SE 2026-04 unverdicted novelty 6.0

REAgent improves LLM patch generation for software issues by 17.4% on average through automated construction, quality checking, and iterative refinement of structured issue-oriented requirements.
Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild
cs.SE 2026-03 unverdicted novelty 6.0

AI coding assistants introduce code issues that persist in 22.7% of cases across real projects, creating measurable long-term technical debt.
Agentic Inequality
cs.CY 2025-10 unverdicted novelty 6.0

Introduces the concept of agentic inequality and develops a three-dimensional framework (availability, quality, quantity) to analyze how autonomous AI agents could deepen or mitigate existing divides through scalable ...
PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes
cs.SE 2025-05 conditional novelty 6.0

Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.
The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot
cs.SE 2024-10 unverdicted novelty 6.0

Analysis of GitHub Copilot usage shows a 5.9% increase in project code contributions offset by 8% more coordination time, yielding net positive effects on code merges with varying impacts on core and peripheral developers.
StarCoder 2 and The Stack v2: The Next Generation
cs.SE 2024-02 accept novelty 6.0

StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
GPT-4 Technical Report
cs.CL 2023-03 unverdicted novelty 6.0

GPT-4 is a scaled Transformer model with post-training alignment that reaches human-level performance on academic and professional benchmarks via infrastructure enabling performance prediction from much smaller models.
The Impact of AI Coding Assistants on Software Engineering: A Longitudinal Study
cs.SE 2026-05 unverdicted novelty 5.0

Longitudinal surveys show AI coding assistants reduce time on code writing but increase supervisory verification tasks, with stable productivity perceptions yet rising reports of worsened developer experience.
One Developer Is All You Need: A Case Study of an AI-Augmented One-Person Squad in a Brownfield Enterprise
cs.SE 2026-05 conditional novelty 5.0

One experienced developer with four AI agents delivered a four-person scoped brownfield project in half the time at over 85% lower staffing cost under a spec-driven process.
Assistance to Autonomy: A Systematic Literature Review of Agentic AI across the Software Development Life Cycle
cs.SE 2026-05 unverdicted novelty 5.0

Systematic review of agentic AI in the SDLC finds output verifiability drives industrial adoption in later phases, with Planner-Executor-Reviewer as the dominant pattern, plus a new multi-agent LLM screening pipeline ...
A Generative AI Driven Interactive Narrative Serious Game for Stress Relief and Its Randomized Controlled Pilot Study
cs.HC 2026-05 unverdicted novelty 5.0

Reverie is a new AI-powered game that reduced stress levels in a pilot study of 20 students while providing excellent user experience and improved cognitive emotion regulation.
A Generative AI Driven Interactive Narrative Serious Game for Stress Relief and Its Randomized Controlled Pilot Study
cs.HC 2026-05 unverdicted novelty 5.0

Pilot study of a ChatGPT-driven narrative game found significant stress reduction (p=0.016) and positive user experience among 20 stressed students.
A meta-analysis of the effect of generative AI on productivity and learning in programming
cs.SE 2026-05 unverdicted novelty 5.0

Meta-analysis of 23 studies shows moderate productivity gains from GenAI coding assistants (Hedges' g=0.33) but no significant effect on learning (g=0.14).
Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap
cs.SE 2026-05 unverdicted novelty 5.0

Comparative review of AI coding tool ToS shows responsibility for code quality and compliance shifted to users, with policy misalignment for autonomous agents, plus a research roadmap.
HAAS: A Policy-Aware Framework for Adaptive Task Allocation Between Humans and Artificial Intelligence Systems
cs.AI 2026-05 unverdicted novelty 5.0

HAAS is an implemented framework using rule-based governance and contextual bandits to adapt human-AI task allocation, with empirical results showing tunable governance can improve manufacturing performance and reduce...
The Productivity-Reliability Paradox: Specification-Driven Governance for AI-Augmented Software Development
cs.SE 2026-05 unverdicted novelty 5.0

The Productivity-Reliability Paradox arises because AI code generators produce variable output while developers lack sufficient specification discipline, making governance models focused on specifications the binding ...
Addressing the Reality Gap: A Three-Tension Framework for Agentic AI Adoption
cs.CY 2026-04 unverdicted novelty 5.0

A three-tension framework is introduced to help navigate the adoption of autonomous agentic AI systems in K-12 and higher education by addressing practical, temporal, and value-based challenges.
Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering
cs.SE 2026-04 unverdicted novelty 5.0

Agentic AI systems are shifting software engineering from line-level code generation to delegated repository-scale execution under supervision, with SWE-bench performance rising from 1.96% to 78.4% and productivity ga...
Relationships Between Trust, Compliance, and Performance for Novice Programmers Using AI Code Generation
cs.HC 2026-04 unverdicted novelty 5.0

Among novice programmers using AI code generators, trust did not predict compliance with suggestions, while performance correlated with both compliance and increased subsequent trust.
More Is Different: Toward a Theory of Emergence in AI-Native Software Ecosystems
cs.SE 2026-04 unverdicted novelty 5.0

AI-native software ecosystems exhibit emergent behaviors best explained by complex adaptive systems theory, requiring new ecosystem-level monitoring and seven testable propositions that may extend or replace Lehman's laws.
Scaling Human-AI Coding Collaboration Requires a Governable Consensus Layer
cs.SE 2026-04 unverdicted novelty 5.0

Agentic Consensus replaces code as the main artifact with a typed property graph world model that maintains commitments and evidence through synchronization operators, shifting evaluation to alignment fidelity and con...
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure
cs.SE 2026-04 unverdicted novelty 5.0

Sema Code decouples AI coding agents into a programmable npm library with eight mechanisms for isolation, queuing, compression, scheduling, permissions, and integration.
Generative AI and Two-Tiered Online Mental Health Communities
cs.CY 2026-04 unverdicted novelty 5.0

A quasi-natural experiment on a leading OMHC finds that generative AI integration increases counselor public posting intensity, triggers heterogeneous responses by motivation type, and produces cross-tier spillovers t...
The AI Codebase Maturity Model: From Assisted Coding to Fully Autonomous Systems
cs.SE 2026-04 conditional novelty 5.0

The AI Codebase Maturity Model defines six sequential levels of AI-driven development based on feedback loop topologies, validated by experience reports showing 5x PR and 37x issue throughput gains from level 2 to level 6.
Reproducibility Beyond Artifacts: Interactional Support for Collaborative Machine Learning
cs.HC 2026-04 unverdicted novelty 5.0

Collaborative ML reproducibility requires socio-technical interactional support beyond artifacts, demonstrated via a clinical deployment and addressed by a proposed two-layer system with an AI semantic interface.
EcoAssist: Embedding Sustainability into AI-Assisted Frontend Development
cs.HC 2026-04 unverdicted novelty 5.0

EcoAssist embeds energy estimation and optimization into AI-assisted frontend coding, reducing website energy use by 13-16% in benchmarks while preserving developer productivity.
The Fast and Spurious: Developer Productivity with GenAI
cs.SE 2025-10 conditional novelty 5.0

Survey of 415 developers finds GenAI accelerates coding output but redistributes effort into review and verification, making net productivity gains appear spurious at current adoption levels.
Vibe Coding in Product Teams: Reconfiguring AI-Assisted Workflows, Prototyping, and Collaboration
cs.HC 2025-09 accept novelty 5.0

Interviews reveal a four-stage vibe coding workflow that accelerates prototyping while introducing tensions between quick efficiency and reflective design intention, plus asymmetries in trust and ownership.
AI-Generated Slides: Are They Good? Can Students Tell?
cs.AI 2026-05 accept novelty 4.0

Coding-assistant AI tools generate slides that educators judge accurate and pedagogically sound, students rate them equal to instructor slides, and cannot reliably identify them as AI-generated.
Reliable AI Needs to Externalize Implicit Knowledge: A Human-AI Collaboration Perspective
cs.AI 2026-05 unverdicted novelty 4.0

Reliable AI needs structured Knowledge Objects to externalize and enable human validation of implicit knowledge that current methods cannot verify.
Addressing the Reality Gap: A Three-Tension Framework for Agentic AI Adoption
cs.CY 2026-04 unverdicted novelty 4.0

Presents a three-tension framework for evaluating and designing agentic AI initiatives in K-12 and higher education.
Recommendations for Efficient and Responsible LLM Adoption within Industrial Software Development
cs.SE 2026-04 conditional novelty 4.0

A multi-case study plus survey produces seven actionable recommendations for efficient and responsible LLM use in industrial software engineering.
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
cs.SE 2026-04 unverdicted novelty 4.0

CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
AI Observability for Developer Productivity Tools: Bridging Cost Awareness and Code Quality
cs.SE 2026-04 unverdicted novelty 4.0

A combined observability platform for AI developer tools achieves under 2% cost variance from actual billing and speeds up usage insights by an order of magnitude through real token tracking and analytics over a six-m...
Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance
cs.SE 2026-02 unverdicted novelty 4.0

Task type dominates AI coding agent PR acceptance rates, with documentation at 82.1% versus 66.1% for new features, and no single agent best across all categories.
Developer Productivity With and Without GitHub Copilot: A Longitudinal Mixed-Methods Case Study
cs.SE 2025-09 unverdicted novelty 4.0

A longitudinal mixed-methods case study in a large public-sector IT organization found no statistically significant increase in commit activity after GitHub Copilot adoption, despite pre-existing activity differences ...